ebook img

Dissertation - Matthias Arnold PDF

212 Pages·2016·27.45 MB·German
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Dissertation - Matthias Arnold

Supporting the evidence for human trait-associated genetic variants by computational biology methods and multi-level data integration Matthias Arnold 2016 TECHNISCHE UNIVERSITÄT MÜNCHEN Lehrstuhl für Genomorientierte Bioinformatik Supporting the evidence for human trait-associated genetic variants by computational biology methods and multi-level data integration Matthias Arnold Vollständiger Abdruck der von der Fakultät Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt der Technischen Universität München zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften (Dr. rer. nat.) genehmigten Dissertation. Vorsitzender: Univ.-Prof. Dr. J. J. Hauner Prüfer der Dissertation: 1. Univ.-Prof. Dr. H.-W. Mewes 2. Univ.-Prof. Dr. Dr. F. J. Theis 3. Univ.-Prof. Dr. F. Kronenberg Medizinische Universität Innsbruck Österreich Die Dissertation wurde am 10.03.2016 bei der Technischen Universität München eingereicht und durch die Fakultät Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt am 18.07.2016 angenommen. i Danksagung Das größte Dankeschön eines Doktoranden gilt nicht nur traditionell seinem Doktorvater, und somit möchte auch ich mich zuerst bei (Hans-)Werner Mewes für die Ermöglichung dieses Projektes, sein Vertrauen in meine Fähigkeiten, sowie die fast uneingeschränkten Freiheiten, die er mir während meiner Arbeit gestattet hat, herzlich bedanken. Allerdings waren meine Doktorandenkollegen bei dieser Dissertation wohl ebenso beteiligt wie meine Betreuer und auch ich selbst. Für die fachlichen wie auch privaten Diskussionen, die regelmäßigen Motivationsschübe und die unvergesslichen “Betriebsausflüge” nach Schwabing möchte ich mich deswegen an zweiter Stelle bei Jörn Leonhardt, Johannes Raffler, Florian Büttner und Daniel Ellwanger bedanken. Besonderer Dank gilt auch Gabi Kastenmüller, die meine direkte Betreuung übernommen hat, als mein ehemaliger Gruppenleiter das Institut verließ. Arne Pfeufer danke ich für die vielen Diskussionen und Ideen. Ohne die Unterstützung dieser beiden wären die meisten Projekte im Rahmen dieser Doktorarbeit nicht möglich gewesen, daher fällt ein Gutteil der Qualität der Arbeit auf sie zurück. Meinem Zweitprüfer Fabian Theis möchte ich für die positiven, motivierenden und anregenden Kommentare zu meiner Arbeit danken. Den zahlreichen internen und externen Kooperationspartnern, die meine Projekte unterstützt haben oder die mich in Ihre Projekte involviert haben, muss ich ebenfalls meinen Dank aussprechen: Jan Krumsiek vom ICB, Karsten Suhre vom IBIS (und aus Katar), Matthias Wjst vom ILBD und Christian Gieger vom IGE am Helmholtz Zentrum München; Elke Rodriguez, Stephan Weidinger und Hansjörg Baurecht vom Uniklinikum Kiel; Nicole Soranzo vom Wellcome Trust Sanger Institute; So-Youn Shin von der University of Bristol; Stefan Herms von der Universität Basel; und Thilo Dörk-Bousset von der Medizinischen Hochschule Hannover. Auch möchte ich “meinen” Studenten Kinga Balázs, Christoph Schramm, David Fuggersberger, Quirin Heiß, Niklas de Andrade Krätzig und Nick Lehner für die angenehme Zusammenarbeit, sowie allen nicht namentlich genannten Kollegen von IBIS und ICB für das kollegiale Umfeld danken. Dank an Familie und Freunde möchte ich mir hier sparen, da eine Nennung auf einem Stück Papier nicht ausdrücken kann, was gesagt werden müsste. An meine Eltern möchte ich aber doch noch ein Zitat richten, da sie lange genug darauf gewartet haben: “Ich habe fertig.” (G. Trapattoni) ii iii Abstract Genome-wide association studies (GWAS) are an effective tool to map genetic regions contributing to multifactorial human traits and diseases and yielded a catalog of thousands of robust associations. The major recurring point of criticism with regards to the GWAS approach is that the obtained loci are of only limited value because in most cases the associations can neither be linked to a plausible causal gene nor provide information on the molecular background involved in trait development and progression. This thesis provides a detailed investigation of the challenges arising from this issue and proposes various evidenced-based and integrative computational approaches as well as a novel bioinformatics tool that enable comprehensive functional characterization of GWAS loci and thereby facilitate the elucidation of potential mechanisms underlying genotype-trait associations. The first part of the thesis describes three GWA studies that have been conducted during this work to identify and characterize specific challenges in the interpretation of GWAS results. The first study investigates the influence of common genetic variants and rare copy number variants (CNVs) on sudden infant death syndrome (SIDS). While the results showed only indicative evidence for weak additive effects of common variants on SIDS risk, analysis of CNVs revealed rare deletion syndromes as likely causes of sudden infant death for a substantial number (12 of 301) of the cases. Two further GWAS focused on common genetic variants influencing the concentration of metabolites in human blood and urine samples. Here, we identified and replicated more than 150 genetic loci, thus providing a large compendium of genomic regions implicated in the genetic control of human metabolic homeostasis. In addition to the central study results, I illustrate the challenges associated with the GWAS approach by showing the complexity of interpreting weak genetic influences on extreme disease endpoints such as SIDS. In the GWAS on blood metabolic traits, I then emphasize the utility of thorough manual annotation of genetic associations to identify the most plausible causal gene and to suggest a testable effect hypothesis for each identified locus. Finally, in the urine metabolomics GWAS, I propose a method to automate the identification of predicted causal genes using a straightforward evidence-based gene prioritization metric. To enable and facilitate automated causal gene prediction, in the second part of this thesis I developed an extensive data integration resource. This resource, representing the first genetic variant-based genome browser, allows for comprehensive annotation of the impact of genetic variation using evidence-based variant effect predictions. In the development process, I integrated, harmonized, and consolidated genome-wide annotation data from various sources iv comprising genes, transcripts, proteins, genetic variants, regulatory elements including microRNA binding sites, enhancers and promoters, a set of genome- and exome-wide conservation and deleteriousness scores, as well as a large collection of trait annotations and associations for genes and genetic variants. The browser is extended by modules for the analysis, aggregation, and visualization of genomic annotations linked to genetic variants on a genome- wide scale. The resource thus provides both interfaces to the collected data and semantic categorization of the available variant-linked evidences in logical sections, which enables direct hypothesis generation using the modules’ output. In the third and final part of the thesis, I demonstrate the value of integrative bioinformatics approaches by utilizing the data incorporated in this resource to shed light on the potential molecular consequences of genetic variants identified by GWAS from three perspectives. In the first study, I present the concept of biological networks by integrating genetic variants and their previously collected associated diseases in a directed bipartite network. Analysis of this network showed that identical genetic loci frequently influence several different complex diseases both in agonistic and antagonistic effect directions. It is a yet unsolved question if such loci are to be considered pleiotropic featuring conditionally distinct functions, or if they pinpoint the same nodes in a cellular pathway that, in dependency of further genetic and environmental influences, lead to diverging phenotypic endpoints. The shared association signal observed for melanoma and vitiligo located in the tyrosinase gene, which has a central function in skin pigmentation, serves as an example for the former hypothesis. Here, the allelic effects suggest inverse trait-specific antigenicity of the encoded TYR protein that results in skin pigmentation being either elevated (as in skin cancer) or depleted (as in vitiligo) depending on allelically determined active or inactive targeting of TYR antigens via immune surveillance. The second study investigates the collected target sites of microRNAs, a special class of small non-coding RNAs involved in post-transcriptional gene regulation, for interrelations with trait-associated genetic variants. I demonstrate that trait-associated variants are significantly enriched in the 3’- untranslated region of human transcripts, which presents the major targeting region of microRNAs. Using the results of the blood metabolomics GWAS, I show that for a large fraction (>10%) of genetic loci linked to metabolic traits there is evidence for the involvement of genetically influenced microRNA regulation in metabolic control. The very specific mechanism described for genetic alteration of lipoprotein lipase-controlled lipid homeostasis by modulating its functioning potential via allele-dependent targeting of its transcripts by miR-410 underlines the value of this approach. The third study explores regulatory effects of genetic v variants affecting the promoter and enhancer elements contained in the developed variant annotation resource. For the purpose of characterizing allele-specific effects on gene regulation, I used a novel clustering of cross-tissue regulatory element annotations. It is shown that the information aggregated within clusters can reveal direct interactions between enhancer elements, specific transcription factors, and the expression of more distal genes. The utility of the derived clusters in predicting allele-specific modifications of gene regulation is exemplified by a genetic locus from our blood metabolomics GWAS that is associated with alpha- hydroxyisovalerate levels. The associated haplotype is predicted to alter the binding motif of the Myc/Max transcription factor complex in a distal promoter-associated enhancer, leading to experimentally validated allele-specific changes of lactate dehydrogenase A expression. Combination with additional metabolic and enzymatic evidences further indicates a potential pleiotropic role of the encoded dehydrogenase in aerobic branched-chain amino acid and anaerobic lactate metabolism. In summary, this thesis provides a detailed motivation for the application of large-scale integrative approaches in human genetic studies, illustrated using the findings of three GWA studies. With the implementation of a free-to-use, extensible, updatable, and programmatically accessible data integration resource, I introduce a novel bioinformatics platform that meets the requirements of integrative methods for causal gene prediction in the GWAS context in a comprehensive, yet user-friendly, way. In three studies covering different aspects of the molecular consequences introduced by genetic variation, I finally demonstrate that integrative methods based on this resource successfully mark novel, specific, as well as testable hypotheses for further investigation. vi

Description:
Matthias Arnold. Vollständiger Abdruck der von der Fakultät Wissenschaftszentrum Weihenstephan für. Ernährung, Landnutzung und Umwelt der
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.