University of Groningen Bioinformatics of genomic association mapping Vaez Barzani, Ahmad IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below. Document Version Publisher's PDF, also known as Version of record Publication date: 2015 Link to publication in University of Groningen/UMCG research database Citation for published version (APA): Vaez Barzani, A. (2015). Bioinformatics of genomic association mapping. [Thesis fully internal (DIV), University of Groningen]. University of Groningen. Copyright Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons). The publication may also be distributed here under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license. More information can be found on the University of Groningen website: https://www.rug.nl/library/open-access/self-archiving-pure/taverne- amendment. Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum. Download date: 17-03-2023 Bioinformatics of Genomic Association Mapping Ahmad Vaez 222211110000222200004444----LLLL----bbbbwwww----VVVVaaaaeeeezzzz Ahmad Vaez Bioinformatics of Genomic Association Mapping © Ahmad Vaez, 2015. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without written permission from the author. For articles published or accepted the copyright has been transferred to the respective publisher. Cover design: Kowsar: Ahmad Abdollahi Page layout: Ahmad Vaez Printed by: Ipskamp Drukkers BV ISBN: 978-90-367-8202-9 (print) ISBN: 978-90-367-8201-2 (e-book) 222211110000222200004444----LLLL----bbbbwwww----VVVVaaaaeeeezzzz Bioinformatics of Genomic Association Mapping PhD thesis to obtain the degree of PhD at the University of Groningen Ahmad Vaez on the authority of the Bioinformatics of Genomic Association Mapping Rector Magnificus Prof. E. Sterken and in accordance with © Ahmad Vaez, 2015. All rights reserved. No part of this publication may be the decision by the College of Deans. reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without written This thesis will be defended in public on permission from the author. For articles published or accepted the copyright has been transferred to the respective publisher. Wednesday 23 September 2015 at 09:00 hours Cover design: Kowsar: Ahmad Abdollahi Page layout: Ahmad Vaez by Printed by: Ipskamp Drukkers BV Ahmad Vaez Barzani ISBN: 978-90-367-8202-9 (print) ISBN: 978-90-367-8201-2 (e-book) born on 11 May 1973 in Isfahan, Iran 222211110000222200004444----LLLL----bbbbwwww----VVVVaaaaeeeezzzz Supervisor Prof. H. Snieder Co-supervisors Dr. B.Z. Alizadeh Dr. I.M. Nolte Assessment Committee Prof. H.M. Boezen Prof. O. Franco Prof. D.I. Chasman 222211110000222200004444----LLLL----bbbbwwww----VVVVaaaaeeeezzzz Supervisor CONTENTS Prof. H. Snieder Chapter 1 General introduction 7 Co-supervisors Part 1 | How to check if the phenotype is appropriate for genomic association mapping Dr. B.Z. Alizadeh Chapter 2 Genetic and environmental influences on stability and 21 Dr. I.M. Nolte change in baseline levels of C-reactive protein: a longitudinal twin study Assessment Committee Part 2 | How to perform a GWAS analysis Prof. H.M. Boezen Chapter 3 lodGWAS: A software package for genome-wide 41 Prof. O. Franco association analysis of biomarkers with a limit of detection Prof. D.I. Chasman Chapter 4 QCGWAS: A flexible R package for automated quality 63 control of genome-wide association results Part 3 | How to follow up GWAS results: post-GWAS analyses Chapter 5 Bivariate genome-wide association study identifies novel 89 pleiotropic loci for lipids and inflammation Chapter 6 Investigating the causal relationship of C-reactive protein 115 with 32 complex somatic and psychiatric outcomes: A large scale cross-consortia Mendelian randomization study Chapter 7 An in silico post-GWAS analysis of C-reactive protein loci 215 suggests an important role for interferons Part 4 | Bioinformatics of genomic association mapping: an A to Z walk-through Chapter 8 General discussion 277 Appendices 303 222211110000222200004444----LLLL----bbbbwwww----VVVVaaaaeeeezzzz 222211110000222200004444----LLLL----bbbbwwww----VVVVaaaaeeeezzzz Chapter 1 | General introduction 222211110000222200004444----LLLL----bbbbwwww----VVVVaaaaeeeezzzz Chapter 1 Preface The successful completion of the Human Genome Project marked a milestone for genomics research (1). Likewise, new microarray technologies have become widely available providing simple and elegant ways of rapidly delivering vast amounts of biological data. Moreover, three major, multi-country projects have been launched in the past few years to improve the understanding of genomic diversity: The International HapMap Project (2), the 1000 Genomes Project (3), and the Encyclopedia of DNA Elements (ENCODE) project (4). All of the data of these projects have been made publicly available to the scientific community (Box 1, (5)). However, generating and storing vast amounts of biological data only makes sense if data are creatively used for further research to produce novel insights into biological mechanisms (6). Bioinformatics is an interdisciplinary field to address this growing need. Hence, bioinformatics has been one of the most dynamically evolving fields of science in the past few years. The Wikipedia encyclopedia describes bioinformatics as an interdisciplinary field that ‘develops’ and ‘applies’ methods and software tools for understanding biological data. It combines computer science, statistics, mathematics, and engineering to analyze biological data. Bioinformatics is an umbrella term for both (i) methodological studies that ‘develop’ novel algorithms, software packages, or analysis pipelines, and (ii) biological studies that ‘apply’ those tools as part of their methodology (7). Genetic epidemiology studies the role of genetic factors in health and disease in families and/or populations (8). It was defined by Morton as “a science which deals with the etiology, distribution, and control of disease in groups of relatives and with inherited causes of disease in populations” (9). The vast availability of biological data as well as handiness of modern bioinformatics-based methods have yielded rapid advances in discovering genetic variants underlying different human diseases or traits, so-called genomic association mapping (10, 11). Unlike rare Mendelian disorders, common complex diseases or traits, such as ischemic heart disease, hypertension, or serum levels of inflammatory markers, are controlled by the complex interplay between multiple genetic and environmental factors (11). For that reason gene mapping of complex traits or diseases had been largely disappointing in the past. Genomic association mapping is now a systematic approach aiming to find genetic variants that are associated with the trait or disease of interest. Thanks to inexpensive availability of microarray data and also thanks to accessibility of bioinformatics pipelines for data analysis, 8 222211110000222200004444----LLLL----bbbbwwww----VVVVaaaaeeeezzzz Chapter 1 General introduction Preface Box 1 Timeline of key events of major projects facilitating genomic association mapping. The successful completion of the Human Genome Project marked a milestone for genomics research (1). Likewise, new microarray technologies have become widely 1990 Human Genome Project was launched available providing simple and elegant ways of rapidly delivering vast amounts of biological data. Moreover, three major, multi-country projects have been launched 2001 Draft of Human Genome Project was published in the past few years to improve the understanding of genomic diversity: The 2002 International HapMap Project was launched International HapMap Project (2), the 1000 Genomes Project (3), and the 2003 Human Genome Project was completed Encyclopedia of DNA Elements (ENCODE) project (4). All of the data of these projects have been made publicly available to the scientific community (Box 1, (5)). 2003 International HapMap Project first major public data released However, generating and storing vast amounts of biological data only makes sense 2003 ENCODE project was launched if data are creatively used for further research to produce novel insights into 2007 Pilot phase of the ENCODE project was finished biological mechanisms (6). Bioinformatics is an interdisciplinary field to address this growing need. Hence, bioinformatics has been one of the most dynamically 2008 1000 Genomes Project was launched evolving fields of science in the past few years. The Wikipedia encyclopedia 2010 International HapMap Project latest public data released describes bioinformatics as an interdisciplinary field that ‘develops’ and ‘applies’ 2010 Phase 1 of 1000 Genomes Project was completed methods and software tools for understanding biological data. It combines computer science, statistics, mathematics, and engineering to analyze biological 2012 ENCODE project initial results were published data. Bioinformatics is an umbrella term for both (i) methodological studies that 2014 Phase 3 of 1000 Genomes Project data released ‘develop’ novel algorithms, software packages, or analysis pipelines, and (ii) biological studies that ‘apply’ those tools as part of their methodology (7). genomic mapping of complex traits or common diseases has become very Genetic epidemiology studies the role of genetic factors in health and successful in recent years (11). disease in families and/or populations (8). It was defined by Morton as “a science which deals with the etiology, distribution, and control of disease in groups of Genomic association mapping consists of a number of classic steps as relatives and with inherited causes of disease in populations” (9). The vast described in Figure 1. First we should check if the trait or disease of interest is availability of biological data as well as handiness of modern bioinformatics-based appropriate for genomic mapping. This can be done by a typical heritability study, methods have yielded rapid advances in discovering genetic variants underlying that is investigating the proportion of variation in a trait that is due to genetic different human diseases or traits, so-called genomic association mapping (10, 11). differences between individuals. If there is evidence of heritability, the trait is (partially) controlled by genetic factors and genomic association mapping is a logical Unlike rare Mendelian disorders, common complex diseases or traits, such next step (12). as ischemic heart disease, hypertension, or serum levels of inflammatory markers, are controlled by the complex interplay between multiple genetic and Once it has been established that the trait of interest is heritable, we should environmental factors (11). For that reason gene mapping of complex traits or attempt to identify genetic variants that are significantly associated with the trait. diseases had been largely disappointing in the past. Genomic association mapping is Although a number of different methodologies can be used, the experimental now a systematic approach aiming to find genetic variants that are associated with design of the genome-wide association study (GWAS) has been very successful and the trait or disease of interest. Thanks to inexpensive availability of microarray data played a key role in genomic association mapping during the past few years (11). and also thanks to accessibility of bioinformatics pipelines for data analysis, GWAS analysis is typically aimed at detecting genetic variants associated with common complex traits or diseases without prior hypothesis of functionality. 8 9 222211110000222200004444----LLLL----bbbbwwww----VVVVaaaaeeeezzzz
Description: