ebook img

Origin of genes with unresolved ancestry PDF

56 Pages·2013·3.82 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Origin of genes with unresolved ancestry

Master in Proteomics and Bioinformatics Origin of genes with unresolved ancestry Analysis of orphan genes in H.sapiens, D.melanogaster and S.cerevisiae Adrián César Razquin Thesis supervisors: Dr. Evgenia Kriventseva and Prof. Evgeny Zdobnov Université de Genève January 2013 CONTENTS Abstract .................................................................... 3 1 Introduction .............................................................. 5 1.1 Ontheoriginofgenes ................................................... 5 1.2 Orthology ............................................................. 9 1.2.1 Homologs,orthologsandparalogs ..................................... 9 1.2.2 Orthologyinferencemethods ......................................... 10 1.2.3 Orthologydatabases. OrthoDB. ....................................... 11 1.3 Orphans .............................................................. 12 1.4 Aimoftheproject ...................................................... 14 2 Methods ................................................................ 15 2.1 Programminglanguage ................................................. 15 2.2 Identificationoforphans ................................................ 15 2.3 Correlationofnumberoforphansandevolutionarydistance ................ 15 2.4 GO-terms ............................................................. 16 2.5 Evolutionaryrates ..................................................... 17 2.6 Siblinggroups ......................................................... 17 2.7 TransposableElements ................................................. 18 3 Results .................................................................. 19 3.1 Identificationandcharacterizationoforphans ............................. 19 3.1.1 Identification ....................................................... 19 3.1.2 Numberoforphansandevolutionarydistances ......................... 19 3.1.3 GeneOntologyanalysis. ............................................. 21 3.1.4 Orphansasfast-evolvinggenes ....................................... 22 3.2 Analysisofsiblinggroups ............................................... 22 3.3 Analysisoftransposableelements ........................................ 25 3.4 Casestudy ............................................................ 28 4 Discussion .............................................................. 33 Appendix .................................................................. 37 Acknowledgements ......................................................... 47 References ................................................................. 49 Abstract Since the very beginning of comparative genomic studies, researchers became aware of the existence of a set of genes that shows no clear homology when compared with other genomes. Althoughtheywereinitiallyattributedtothelackofavailablegenomicdata,these orphangeneshavebeenidentifiedinallgenomeprojectstodateandtheymightconstitutean irreduciblefractionofgenespresentinthegenomeofeveryorganism. Differenthypotheses have been proposed in the last years to explain the evolutionary origin of orphans, but the issueremainsstillunclear. The present study identifies and characterizes orphan proteins in H.sapiens, D.melano- gasterandS.cerevisiaebasedontheorthologyclassificationofOrthoDBandaimstoprovide an insight into some possible origins of orphans. We show that orphans constitute a large fractionoftheproteomesofthethreeselectedspecies,andweprovideevidenceoftheirhigh evolutionaryrateandtheirrelationtoadaptivefunctions. Moreover, basedontheanalysis of homology with non-ortholog proteins, we suggest that most of them might have orig- inated by duplication followed by an accelerated evolution, alone or in combination with otherprocessesinvolvingdomainrearrangementssuchasexonshuffling. Finally,weshow that a small percentage of the orphans contains sequences derived from transposable ele- ments (TEs) in their coding regions, which might have also contributed to their origin and divergence. 3 4 1 Introduction 1.1 On the origin of genes Sincethepublicationofthefirstwhole-genomesequenceofafree-livingorganismin1995 [1],afastincreasingnumberofgenomesissequencedandpublishedeveryyear. Asanillus- tration,theGenomesOnLineDatabase(GOLD),aworldwideresourceformonitoringgenome andmetagenomeprojects, containedinformationfor11472sequencing projectsinSeptem- ber 2011, of which 2907 had been completed –either finished or as permanent drafts-. The statistics since 1995 show an approximately exponential growth in the number of genome sequencingprojects[2]. Theanalysisofgenomicdatahasrevealedagreatvariationingenenumberamongorgan- isms,indicatingtheexistenceofacontinuousprocessofgeneorigination. Severalmolecular mechanismscanbeinvolvedintheformationofnewgenes,eithercreatingthemfrompre- existingonesorpartsofthem[3]orde-novofromnoncodingsequences. Inanycase,thenew gene must appear in the germ line in order to be transmitted to the offspring and spread withinthepopulation,whereitcanfinallyhaveoneoftwomajoroutcomes: fixationorex- tinction[4]. HereIbrieflyreviewthemaincharacteristicsofprocessescreatingnovelgenes. I. Geneduplication. Geneduplicationwasthefirstmechanismproposedforgeneorigina- tion, initially suggested by Muller and Haldane in the 1930’s [5 and 6]. However, it was mainlythankstoOhno’sworkinthe1970s[7]thatgeneduplicationstartedtoberegarded asthemostimportantforcegeneratinggenenovelty. Two major types of duplications can take place in the genome of an organism: seg- mentalduplications,whichincludeindividualgenesorgenomicregions,andwhole-genome duplications (WGD), which consist in the copy of the entire genome. WGD are rare but recurrenteventsthatcauseenormousgenomicchange,abletoleadtothere-diploidization ofagenome–e.g. plantssuchasTriticumaestivum,thebreadwheat,whichishexaploid-. Thisprocesshasbeendescribedinbothprokaryotes[8]andeukaryotes,withwell-known caseshavingoccurredinyeast,angiospermsandteleosts[9]. UptotworoundsofWGD arethoughttohaveoccurredbeforetheoriginofthevertebratelineage[10]. Mechanisms underlyingWGDeventsincludegenomedoublingandgameticnon-reduction,whichrepre- sentcellcyclefailuresduringmitosisandmeiosis,respectively,andpolyspermia[11]. Segmentalduplications,ontheotherhand,occurmorefrequentlythanWGDbutaffect smallerregions,rangingfromafewnucleotidestoseveralthousandkilobases. Twomole- cularmechanismsthatcanleadtosegmentalduplicationsarenon-allelichomologousrecom- bination(NAHR)andillegitimaterecombination(IR)[12],whichcanberegardedaserrors innormalcellularprocesses. Inthissense, NAHR-alsocalledunequalcrossing-over-isa recombination event that takes place between highly similar but non-allelic sequences, for example due to the missalignment of repeated elements like low copy number repeats (LCRs)ortransposibleelements(TEs);andIRresultsfromthemalfuctionofthenon-homol- ogousend-joining(NHEJ)pathwayfordouble-strandDNAbreak(DSB)repair, whichfuses twofreeDNA-ends[13]. SomestudiesinhumananddrosophilaconcludedthatNAHR 5 Introduction is a frequent mechanism generating segmental duplications, which mainly creates dis- persed duplicates, while IR is more rare and plays a more important role in producing tandemduplications[14and15]. Homeobox[16]andglobin[17]genesarewell-knownex- amplesofsegmentalduplications. Anothersourceofduplicationsisretroposition,which createsduplicategenesinnewgenomiclocationsbyreversetranscriptionofmaturemR- NAs, recruiting the necessary enzymatic activities from autonomous transposable ele- ments such as the L1 elements in mammals. The copies resulting from retroposition differ from regular gene duplicates in several features, such as their lack of introns and thepresenceofflankingshortdirectrepeats. Besides,aspromoterregionsareusuallynot retroposedfromtheoriginalgene,thesegeneduplicatesbecomenon-expressedprocessed pseudogenesunlesstheyrecruitnewregulatorysequencesfromneighboringgenes-which dependsonthegenomiclocationofthenewcopy-. [3] Initsmodel,Ohnoconsideredthatmostofthenewgenesoriginatedfromaduplica- tioneventwouldrapidlyaccumulatemutationsanddegenerateintheformofapseudo- gene[18]. Nevertheless,nowadaysitisknownthatevenifalotofgeneduplicatesbecome pseudogenes, many of them can be fixed and remain functional [19] through processes such as evolution of redundancies, subfunctionalization and neofunctionalization [12]. Evolu- tionofredundancieshappenswhenthegenedosageincreasecausedbyaduplicationisfa- voredbynaturalselection,andthereforethetwogenecopies,whichperformequalfunc- tions,coexistunchangedinthegenome. Thisfatehasbeenespeciallydescribedinyeast [20]. Subfunctionalization, also called the duplication-degeneration-complementation (DDC) model[19,21and22],istheprocessofpartitioningtheancestralgenefunctionbetween the two copies, mainly by complementary degeneration of regulatory elements. In this case, the union of activities and expression patterns of the two duplicates results in the original function of the ancestral gene. A well known example of this process are the gene duplicates GAL1/GAL3 of Saccharomyces cerevisiae [23]. Finally, neofunctionalization consists in the acquisition of novel functions by one of the gene copies while the other copy retains the ancestral function. Indeed, as the original gene function is ensured by oneofthecopies,theotherduplicatecanbesubjecttoamorerelaxedpurifyingselection andthereforeitcanrapidlyaccumulatemutationsthatinotherconditionswouldbeelim- inated. Thisacceleratedevolutionnotonlycanincludenucleotidesubstitutions,butalso largermodificationssuchasinsertionsanddeletions,domainshuffling,fussionsandfis- sions. Eventually,mutationswouldleadtoanewallelebearingabeneficialfunctionfor theorganism, whichwouldbepreservedbyselection. Severalworkshavereportedthe influenceofadaptivenaturalselectionintheevolutionofyounggeneduplicates[24and 25]. As an illustration, many proteins from the Major Histocompatibility Complex (MHC) are known to have originated from gene duplication and subsequent neofunctionaliza- tion[26]. Furthermore,thenatureoftheduplication-segmentalorWGD-,canhaveadirectin- fluenceonthefateofthenewborngenes. Asanillustration,asegmentalduplicationof agenewhoseproductpresentsmanyinteractionpartners-e.g. componentsofsignaling pathways,ortranscriptionalregulators-woulddisruptthestoichiometryofthecomplex, andmostprobablytheduplicationwouldbecounter-selected. Contrarily,iftheduplica- tionresultsfromaWGDevent,thegenedosagebalancewouldbemaintained,asallthe interactors of the network would be duplicated, and therefore the duplication could be retained[27]. Around 15% of the human genes are thought to have arisen by duplication, as well as8-20%ofthegenesofDrosophilamelanogaster,Caenorhabditiselegans,andSaccharomyces cerevisiae [11]. Moreover, frequences of gene duplications were estimated at more than 6 Introduction 100 gene duplications per 1 million years in the human genome [28] and 17 genes per millionyearsinflies[29]. II. Exon shuffling and gene fusion and fission. Exon shuffling, and to a smaller extent gene fusionandfission,areprocessesthatgeneratenewexonarrangementsthatcanleadtothe creationofnovelchimericgenes. Exon shuffling, or intron-mediated recombination, is the process through which a new exonisinsertedintoanexistinggeneoranexonisduplicatedwithinagenetogenerate anewintron-exonstructure[3]. Thismechanismforgeneratingnewgeneswasinitially proposedin1978byWalterGilbert[30],andnowadaysisthoughttobeoneofthemajor evolutionaryforcesthatshapeeukaryotegenomesandproteomes. Around19%ofexons ineukaryoticgenesareestimatedtohavebeencreatedbyexonshuffling[3] Thesamemechanismsthatproducesegmentalduplicationscanleadtotheectopicre- combinationofexons: non-homologousorillegitimaterecombination,andretroposition. Con- sequently, exon shuffling seems to have a huge importance in the evolution of complex genomes,asthesecontainmanyintronsandrepetitiveelementsthatfacilitateintronicre- combinationevents[31]. Also,exonshufflingappearstohavebeenespeciallyimportant intheformationofmultidomainproteinsinmetazoans[32and33]. Two features of genomes are used to evaluate exon shuffling: intron phase and exon symmetry. Intronphaseisthepositioninwhichtheintroninterruptsthereadingframeof agene: phase0intronsareplacedbetweencodons,whereasphase1andphase2introns lie after the first and the second nucleotides of the codon, respectively. Exon symmetry classifies exons in nine categories according to the phase of the introns sourrounding them. Exonsaresymmetricifbothintronphasesareequalandassymmetricotherwise. The importance of this classification resides in the fact that only symmetric exons can be inserted, duplicated or deleted in introns -of the same phase- without changing the readingframe, andthereforetheseeventswouldbelessaffectedbypurifyingselection. Accordingly,researchershavedetectedanexcessofsymmetricexonsinmosteukaryotic genomesanalyzedtodate,mainlyoftype1-1inmetazoanspeciesand0-0exonsinnon metazoans. Thisfactsuggeststheimportanceofexonshufflinginmodelinggenomes[33 and34]. Gene fusions and fissions have been mainly described in the origin of many genes in prokaryotic genomes [35], although examples in eukaryotes also exist –e.g. the fusion geneKUA-UEVinhumans[36]-. Kummerfeldetal. estimatethatfusionsarearoundfour timesmorefrequentthanfissions,probablybecausetheyimplythedeactivationorloss of certain gene regions, which is genetically simpler to achieve than the gain of these same regions necessary for a fission event. These are mainly terminal sequences of the firstgenesuchastranslationstopcodonsortranscriptionterminationsignals,andinitial regulatoryregionsofthesecondgene,likepromoterregions[37]. III.Transposable elements and lateral gene transfer. Transposable elements (TE) are another veryimportantsourceofgenenovelty. Indeed,sincethediscoveryof“jumpinggenes”by theNobelLaureateBarbaraMcClintockinthe1950’s[38and39],theknowledgeandun- derstandingofmobileelementshasdramaticallyincreasedandevolvedinthescientific community. Frombeingregardedjustas“junk”or“selfish”DNA,thecurrentviewconsid- ersTEsasoneofthemajorcomponentsofgenomes,whichplayaveryimportantrolein evolutionbypromotingandcontrollingprocessessuchaschromosomalrearrangements, geneexpression,orevenpopulationadapationandspeciation[40]. 7 Introduction TEs are pieces of DNA with the capability of changing their genomic location, very often making a duplicate copy of themselves during the process. The current most ac- ceptedclassificationofTEs[41–43]dividesthemintwomainclasses,dependingonthe natureofthetranspositionintermediate: Class-Itransposons,or“retrotransposons”,usean RNAintermediateandreversetranscriptionina“copy-and-paste”manner,andClass-II transposons,or“DNAtransposons”,useanDNAintermediate,mainlyby“cut-and-paste”. Theseclassesarethenfurthersubdividedintosubclasses,orders,superfamiliesandfam- iliesbasedonmechanistic,enzymaticandstructureandsequencesimilaritycriteria. The traditional view separated Class-I tranposons into LTR retrotransposons, which contain flanking direct Long Terminal Repeats (LTRs) and derive from retroviruses, and non-LTR retrotransposons,whichlacktheseLTRs. Twowell-knowngroupsofnon-LTRreptrotrans- posons are LINEs -for Long INterspersed ELements- and SINEs -Short INterspersed ELe- ments-, different mainly by their lengths and their coding capacity. Moreover, some of thesegroupsofTEsareautonomousbydefinition,meaningthattheyencodetheneces- saryproteinsforthetranpositionprocess-itisthecaseofDNAtransposons, LTRs, and LINEs- and others, such as SINEs, are non-autonomous, because they lack ORFs and therefore depend in trans on other TEs for mobilization. Nevertheless, a large extent of anyorganism’sTEsaredefectiveandhavelosttheirmobilizationcapacity,henceremain- inginactive. The systematic analysis of genome sequences has shown a great species to species variability in TE content. As an illustriation, TEs make up 80% or more of most plant genomicDNA,butvarybetween3and20%infungiandbetween3and45%inmetazoa. Around 44% of the human genome is made of TEs, and although all the main classes are present, non-LTR retrotransposons -LINEs and SINEs- are predominant and make togethermorethan70%ofhumanTEs. Drosophila’sTEcontentfallsaround18%,andno SINEs are found. Also, this genome seems to contain many young and active families. Ontheotherhand,theyeastS.cerevisiaecontainsonlyLTRtransposons-theso-calledTy elements-,whichconstituteintotalnomorethan5%ofitsgenome[40and41]. ThecapacityofTEstoevolvegenomicfunctionsisnowdaysawell-documentedphe- nomenon,whichisusallyreferredasTE“domestication”or“neofunctionalization”[44and 45]. ByexonizationfromTEsequences,newgenesorpartsofgenescanbecreated[45]. Asanillustration,Aluelements,atypeofSINEsthatmakeupmorethan10%ofthehu- mangenome,arefrequentlyexaptatedtocreatenewexonsinprimategenomes,aprocess thathasbeenestimatedtoconstitutemorethan50%ofnewexonsinH.sapiens. Thisca- pacityhasbeenattributedtotheexistenceofmultiplesitesintheAluelementsthatare similartosplicesites-pseudosplicesites-andcanbecomeactivebyaccumulationofmuta- tions[46]. Accordingly,4%ofhumangenesareestimatedtocontainTEmotifsinintheir codingsequences[31]. Somewell-knownexamplesofgenescreatedbyexonizationfrom TEsaretheRAGgenesoftheimmunoglobulinsystem,originatedfromDNAtransposons [47]ortheTARTandHeTelementsinDrosophila,whichactastelomerases[48]. LateralorHorizontalGeneTransfer(HGT)referstothetransmissionofgeneticmaterial between different organisms, in contraposition to the usual parent-to-offspring vertical inheritance. HGT is common in bacteria and it has long been recognized to be a major forcedrivingprokaryoteevolution[49]. Nevertheless,ithasbeenshownthatTEscanbe horizontally transferred between eukaryotic species, therefore indicating that HGT can also play an important role in generating genomic variation and innovation in eukary- otes. Probably the most well-known case of a HGT event between eukaryotes are the transferofthePelementsofDrosophila,afamilyofTEsthatescapedfromD.willistoniand colonized the genome of laboratory strains of D.melanogaster [50]. Schaack et al. identi- fied 218 convincing cases of HGT events in different eukaryotic lineages -plants, fungi, 8

Description:
irreducible fraction of genes present in the genome of every organism. phylogeny of the considered organism –species-specific orphans, also
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.