ebook img

Generation and Bioinformatic Analysis of Synthetic Ago HITS-CLIP Data PDF

34 Pages·2013·1.89 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Generation and Bioinformatic Analysis of Synthetic Ago HITS-CLIP Data

IT 13 039 Examensarbete 45 hp Juni 2013 Generation and Bioinformatic Analysis of Synthetic Ago HITS-CLIP Data Mehmet Ali Arslan Institutionen för informationsteknologi Department of Information Technology Abstract Generation and Bioinformatic Analysis of Synthetic Ago HITS-CLIP Data Mehmet Ali Arslan Teknisk- naturvetenskaplig fakultet UTH-enheten Micro-RNAs (miRNAs) have been discovered to regulate messenger RNA (mRNA) translation and Besöksadress: degradation. Various recent studies have been focused Ångströmlaboratoriet on miRNA target prediction, in order to get a better Lägerhyddsvägen 1 Hus 4, Plan 0 understanding of the rules and nature of miRNA regulation over mRNAs. In this project we aim to Postadress: create a software module to identify miRNA target Box 536 sites on mRNAs. As basis to this project, we refer to a 751 21 Uppsala study that identified a platform for miRNA-mRNA Telefon: interaction in protein-RNA complexes in mouse brain 018 – 471 30 03 (AGO HITS-CLIP study). We propose a probabilistic model of the data from this study, and generate Telefax: synthetic sample data according to this model, in 018 – 471 30 00 order to create a test bed for a discovery module. Our Hemsida: discovery module analyzes the sample data to identify http://www.teknat.uu.se/student peak regions where the interaction density is high. We present results both on synthetic sample data and data from the AGO HITS-CLIP study to evaluate our module. Handledare: Jens Lagergren Ämnesgranskare: Lars Arvestad Examinator: Ivan Christoff IT 13 039 Tryckt av: Reprocentralen ITC Contents 1 Introduction 3 2 Background 4 2.1 mRNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.1 Transcription . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.2 Translation . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 miRNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 miRNA-mRNA interaction . . . . . . . . . . . . . . . . . . . 8 2.4 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . 8 3 Literature Review 10 4 Methodology 12 4.1 Inputs and their handling . . . . . . . . . . . . . . . . . . . . 12 4.1.1 Selected Genome and Genes . . . . . . . . . . . . . . . 12 4.1.2 Ago HITS-CLIP Data . . . . . . . . . . . . . . . . . . 12 4.2 Synthetic Data Generation . . . . . . . . . . . . . . . . . . . 13 4.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . 13 4.2.2 Parameters . . . . . . . . . . . . . . . . . . . . . . . . 14 4.2.3 Generation . . . . . . . . . . . . . . . . . . . . . . . . 18 4.3 Peak (Target Site) Detection . . . . . . . . . . . . . . . . . . 19 4.3.1 Peak Calling . . . . . . . . . . . . . . . . . . . . . . . 19 5 Results 22 5.1 Peak calling on synthetic data . . . . . . . . . . . . . . . . . . 22 5.1.1 P-values . . . . . . . . . . . . . . . . . . . . . . . . . . 22 5.1.2 Peaks called . . . . . . . . . . . . . . . . . . . . . . . . 23 5.2 Peak calling on Ago data . . . . . . . . . . . . . . . . . . . . 26 6 Conclusions 27 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 7 Bibliography 28 2 Chapter 1 Introduction miRNAs are discovered to regulate gene expression by binding to mRNAs andcausingthemtodegradeorinhibittheirtranslation. Thisinturna↵ects how proteins coded by mRNAs that are targeted by miRNAs are generated. Hence, every life related function they carry out is regulated by miRNAs (see “Background” for details). This is why we are interested in finding miRNA target sites on mRNAs. As Chi et al. [1] provide a platform for investigating miRNA-mRNA in- teraction, we use the output of their study as input to ours. Aligning the isolated mRNA tags from the Argonaute-miRNA-mRNA ternary complex to the genome, we aim to identify regions that are dense in interaction with miRNAs in this ternary complex. We propose a probabilistic model of the aligned mRNA tags from [1]. This model is used both for generat- ing synthetic sample data, and in the target site detection algorithm. The capability of generating synthetic sample data is important to create a test bed for the detection algorithm. The rest of this report is organized as follows: Chapter 2 is meant as a brief background to the biology and probability theory behind the project, whilechapter3focusesonasummaryoftherelatedstudies. Methodologyis presented in chapter 4 and chapter 5 presents the results of our experiments including their discussions. Finally, chapter 6 concludes the report. 3 Chapter 2 Background 2.1 mRNA While the essence or meaning of life is an open debate, functions necessary for life, including catalyzing metabolic reactions and DNA replication, are carried out by protein molecules. These molecules are encoded by genes in DNA. Messenger RNAs (mRNA) carry this encoded genetic information for the amino acid sequence of a protein. Gene expression, namely manufac- turing a protein, happens in two main phases: transcription which is the generation of the mRNA; and the translation of the genetic code residing in the mRNA to a protein [2]. mRNA in prokaryotic and eukaryotic cells have di↵erent properties and they act slightly di↵erently. We focus on eukary- otic mRNAs in this study and further discussion considers only eukaryotic mRNAs. 2.1.1 Transcription TranscriptionistheprocesswherethemRNAisgeneratedbycomplementing part of a DNA strand named the template strand including the genetic information for the protein to be coded. As it can be seen in Fig. 2.1, the entire process starts with the RNA polymerase enzyme (pol II for mRNA transcription) binding to the promoter region for the gene in the template strand of the DNA, which is necessary for the enzyme to be bound to the DNA. In return, RNA polymerase starts unwinding the DNA and adds the complementing nucleotide at the 3’ end of the newly generated RNA until it reaches the termination site. The result is called a pre–mRNA which is processed further to produce the mature mRNA [2]. This post–processing is called splicing. 4 Figure2.1: DNAtranscriptiontoRNA.FigurefromSadavaetal.©2008SinauerAssociates[3]. Usedwithpermission. 2.1.2 Translation Translation is the synthesis process for a protein from the information resid- ing in the mRNA that is coding this protein (see Fig. 2.2). With the help of transfer RNAs (tRNA) and the ribosome, the mRNA is read codon by codon (three base pairs specifying an amino acid) to synthesize the amino acid chain that constitutes the initial form of a protein. For each codon that is read, a tRNA with the corresponding anticodon carries the amino acid coded by the codon to the ribosome and transfers it to the growing amino 5 acid chain [2, 4]. Figure2.2: Summaryofthetranslationprocess. FigurefromMarianaRuizVillarreal/Wikimedia Commons. Usedwithpermission. 2.2 miRNA Before we begin introducing miRNA, note that only metazoan miRNAs and theirfunctionsinmetazoansareconsideredinthisproject. Thus, thereader should consider our elaboration within metazoans only. The history of micro RNAs (miRNA) goes back to 1993 [5], where it was discovered that the LIN-14 protein’s abundance was regulated by a short RNA product through inhibiting its translation. The discovery was con- sidered peculiar until in the turn of the millennium, studies reported the evolutionarily conserved [6] let-7 miRNA to regulate expression of several genes [7]. Today, a search in PubMed with the keyword ”microRNA” gives more than 15000 citations, which gives an idea about the amount of interest in miRNAs in the new century. 6 Figure 2.3: miRNA biogenesis steps. Figure from He and Hannon ©2004 Nature Publishing Group[8]. Usedwithpermission. miRNAs are 22-nucleotide residue RNAs. The precursor of the mature ⇠ miRNA is the 70-nucleotide imperfectly base-paired hairpin segment from ⇠ theRNAthatthemiRNAisderivedfrom[9]. Furtherbiogenesisstepsoccur forthepre-miRNAtobetransformedintothematuremiRNA(seeFig. 2.3). The mature miRNA, together with Argonaute and several other proteins, is assembled into a complex named RNA-induced silencing complex (RISC), which is also referred to as miRNA-protein complex (miRNP). miRNA di- rects this complex to the binding site on the mRNA in order for the miRNP to perform its functions on the mRNA. Rules and regulations for miRNA functions are not crystal clear. This fact is one of the reasons why there is an increasing amount of diverse research focused on miRNAs. However, there is some common ground, such as the fact that the human genome encodes several hundred unique miRNAs and that these miRNAs interact with thousands of mRNAs [10], which in turn means that one unique miRNA interacts with more than one mRNA. It 7

Description:
Generation and Bioinformatic Analysis of Synthetic. Ago HITS-CLIP . Named after Siméon Poisson, the Poisson process is a counting process that.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.