ebook img

Codon Evolution: Mechanisms and Models PDF

297 Pages·2012·3.205 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Codon Evolution: Mechanisms and Models

Codon Evolution Mechanisms and Models EDITEDBY Gina M. Cannarozzi UniversityofBern,Switzerland Adrian Schneider UniversityofUtrecht,TheNetherlands 1 3 GreatClarendonStreet,OxfordOX26DP OxfordUniversityPressisadepartmentoftheUniversityofOxford. ItfurtherstheUniversity’sobjectiveofexcellenceinresearch,scholarship, andeducationbypublishingworldwidein Oxford NewYork Auckland CapeTown DaresSalaam HongKong Karachi KualaLumpur Madrid Melbourne MexicoCity Nairobi NewDelhi Shanghai Taipei Toronto Withofficesin Argentina Austria Brazil Chile CzechRepublic France Greece Guatemala Hungary Italy Japan Poland Portugal Singapore SouthKorea Switzerland Thailand Turkey Ukraine Vietnam OxfordisaregisteredtrademarkofOxfordUniversityPress intheUKandincertainothercountries PublishedintheUnitedStates byOxfordUniversityPressInc.,NewYork ©OxfordUniversityPress2012 Themoralrightsoftheauthorshavebeenasserted DatabaserightOxfordUniversityPress(maker) Firstpublished2012 Allrightsreserved.Nopartofthispublicationmaybereproduced, storedinaretrievalsystem,ortransmitted,inanyformorbyanymeans, withoutthepriorpermissioninwritingofOxfordUniversityPress, orasexpresslypermittedbylaw,orundertermsagreedwiththeappropriate reprographicsrightsorganization.Enquiriesconcerningreproduction outsidethescopeoftheaboveshouldbesenttotheRightsDepartment, OxfordUniversityPress,attheaddressabove Youmustnotcirculatethisbookinanyotherbindingorcover andyoumustimposethesameconditiononanyacquirer BritishLibraryCataloguinginPublicationData Dataavailable LibraryofCongressCataloginginPublicationData LibraryofCongressControlNumber:2011944051 TypesetbySPIPublisherServices,Pondicherry,India Printedandboundby CPIGroup(UK)Ltd,Croydon,CR04YY ISBN 978–0–19–960116–5 1 3 5 7 9 10 8 6 4 2 Contents Foreword ix NickGoldmanandZihengYang Preface xi ListofContributors xiv PartI:Modellingcodonevolution 1: Background 3 AdrianSchneiderandGinaM.Cannarozzi 1.1 Modelsofmolecularevolution 3 1.2 Markovmodels 3 1.2.1 Markovchains 4 1.2.2 Multiplesubstitutions 4 1.2.3 Continuous-timeprocesses 4 1.2.4 Time-reversibility 5 1.3 Maximum-likelihoodestimation 5 1.3.1 MLexample 5 1.3.2 Posteriorprobabilities 6 1.3.3 Likelihoodofaphylogenetictree 6 1.4 Performanceassessment 7 1.4.1 Likelihood-basedtests 7 1.4.2 Simulations 8 1.4.3 Empiricaltests 8 2: Parametricmodelsofcodonevolution 12 MariaAnisimova 2.1 BasicMarkovmodelsofcodonsubstitution 12 2.1.1 FromDNAsubstitutionmodelstocodonmodels 12 2.1.2 Estimatingcodonfrequencydistribution 14 2.2 Evaluatingselectivepressureattheproteinlevel 15 2.2.1 Theneutraltheoryandthelikelihoodratiotest(LRT)forpositive selection 15 2.2.2 Modellingvariableselectionpressureovertime 16 2.2.3 Modellingvariableselectionpressureamongsites 19 2.2.4 Predictinglocationsofsitesunderpositiveselection 20 2.2.5 Detectingpositiveselectioninpresenceofrecombination 20 2.2.6 Modellingvariableselectionpressureamongsitesandovertime 22 2.3 Measuringselectiononphysico-chemicalpropertiesofaminoacids 24 iv CONTENTS 2.4 Modellingsite-dependenceincodingsequences 25 2.5 Furtherdevelopmentofparametricmodels 26 3: Empiricalandsemi-empiricalmodelsofcodonevolution 34 AdrianSchneiderandGinaM.Cannarozzi 3.1 Introduction 34 3.2 EmpiricalmodelbySchneideretal.(2005) 34 3.2.1 Methods 35 3.2.2 Resultsanddiscussion 36 3.2.3 Conclusion 37 3.3 CombinedmodelbyDoron-FaigenboimandPupko(2007) 37 3.3.1 Methods 37 3.3.2 Discussion 39 3.4 ModelbyKosioletal.(2007) 39 3.4.1 Methods 40 3.4.2 Discussion 41 3.5 Codontest 42 3.6 Empiricalsearchforthemostimportantparameters 42 3.7 Summary 42 4: MonteCarlocomputationalapproachesinBayesiancodon-substitution modelling 45 NicolasRodrigueandNicolasLartillot 4.1 Introduction 45 4.2 TheBayesianframework 46 4.3 Site-independentmodelsofcodonsubstitution 47 4.3.1 TheMuseandGaut,andGoldmanandYang-basedmodels 47 4.3.2 PlainMCMC 48 4.3.3 ThermodynamicMCMC 50 4.4 Site-interdependentmodelsofcodonsubstitution 53 4.4.1 TheRobinsonetal.-basedmodels 53 4.4.2 PlainMCMC 54 4.4.3 ThermodynamicMCMC 55 4.5 Otherrecentmodellinginnovationsandoverallrankings 57 4.6 Futuredirections 58 5: Likelihood-basedclustering(LiBaC)forcodonmodels 60 HongGu,KatherineA.Dunn,andJosephP.Bielawski 5.1 Introduction 60 5.2 Theoryforlikelihood-basedclustering(LiBaC) 61 5.3 Detectingpositiveselectioninalarge-scaleanalysisofrealgene sequences 63 5.4 Objectivecomparisonofmodel-basedclassifications 65 5.5 Simulationstudiesofmodel-basedclassification 67 5.5.1 PerformanceofLiBaCandothermethodsonsimulateddata 67 5.5.2 TradeoffsbetweenprecisionandrecallunderLiBaCareadjustable bytheposteriorprobabilitycutoff 68 5.6 RecommendationsforusingLiBaC 69 CONTENTS v 6: Detectingandunderstandingnaturalselection 73 MariaAnisimovaandDavidA.Liberles 6.1 Selectivemechanismsoperatingongenesequences 73 6.2 Briefoverviewofstatisticalmethodologiesfordetectingpositiveselection 77 6.2.1 Neutralitytestsbasedonfrequencyspectrum 77 6.2.2 Neutralitytestsbasedonvariabilitywithinandbetweenspecies 77 6.2.3 Poissonrandom-fieldmodels(PRF) 78 6.2.4 Methodsbasedonpopulationdifferentiation 78 6.2.5 Methodsbasedonlinkagedisequilibrium(LD)andhaplotype structure 79 6.2.6 Methodsbasedondetectingrateshifts 79 6.2.7 Detectingselectionbasedond /d withMarkovcodonmodels 80 N S 6.3 Theutilityandtheinterpretationofthed /d measure 81 N S 6.4 AccountingforindelsandoverlappingORFs 83 6.5 Model-basedapproachesandcommonmisconceptions 84 6.6 Selectionandadaptivetraits 87 6.7 Lessonsfromgenomicstudiesandimplicationsforstudiesof geneticdisease 88 7: Codonmodelsasavehicleforreconcilingpopulationgeneticswithinter-specific sequencedata 97 JeffreyL.Thorne,NicolasLartillot,NicolasRodrigue,andSangChulChoi 7.1 Introduction 97 7.2 Theimportanceofphenotype 97 7.3 TheHalpern-Brunoapproach 98 7.3.1 Thebasicidea 99 7.3.2 Populationgeneticinterpretationsthroughretrofits 101 7.3.3 TheRobinsonmodel 101 7.3.4 TheSella–Hirshrefinement 102 7.3.5 The˘parameter 104 7.3.6 Applicationsandpotentialapplications 105 7.4 LimitationsoftheHalpern–Brunoapproach 106 7.4.1 Thestationarityassumption 106 7.4.2 ThelowmutationrateassumptionandtheHill–Robertsoneffect 107 7.5 Futuredirections 108 8: Robustestimationofnaturalselectionusingparametriccodonmodels 111 GavinA.HuttleyandVonBingYap 8.1 Introduction 111 8.2 Context-dependentsubstitutionmodels 112 8.3 Evaluatingpropertiesofdinucleotidemodels 115 8.3.1 Analysisofsimulateddata 115 8.3.2 Analysisofprimateintrons 116 8.4 Evaluatingpropertiesofcodonmodels 117 8.4.1 Analysisofsimulateddata 118 8.4.2 Analysisofprimateintrons 119 8.5 Impactofmodeldefinitionsonstatisticalpower 121 8.6 Conclusion 122 vi CONTENTS 9: Simulationofcodingsequenceevolution 126 MiguelArenasandDavidPosada 9.1 Introduction 126 9.2 Simulationofcodingsequences 126 9.2.1 Forwardsimulations 126 9.2.2 Simulationsofcoalescenthistories 127 9.2.3 Simulationofcodonsubstitutions 127 9.3 Usesofsimulatedcodingdata 128 9.4 Softwareimplementations 130 10: Useofcodonmodelsinmoleculardatingandfunctionalanalysis 133 StevenA.Benner 10.1 Introduction 133 10.2 Thelevelofanalysismostusefulforfunctionalbiology 133 10.3 ImprovingcodonanalysisbeyondtheK /K andd /d ratios 135 a s N S 10.4 HeuristicapproachestoimprovecodonanalysisbeyondtheK /K andd /d a s N S ratios 136 10.5 Clocks 138 10.6 CalibratingtheTRExclock 140 10.7 Conclusions 143 11: Thefutureofcodonmodelsinstudiesofmolecularfunction:ancestralreconstructionand clademodelsoffunctionaldivergence 145 BelindaS.W.Chang,JingjingDu,CameronJ.Weadick,JohannesMüller,ConstanzeBickelmann, D.DavidYu,andJamesM.Morrow 11.1 Introduction 145 11.2 Ancestralreconstruction 145 11.3 Reconstructingsynonymousevolutioninvertebraterhodopsins 148 11.4 Clademodelsoffunctionaldivergence 152 11.5 TestingforfunctionaldivergenceamongteleostSWS2opsins 155 11.6 Conclusions 158 12: Codonmodelsappliedtothestudyoffungalgenomes 164 GabrielaAguiletaandTatianaGiraud 12.1 Introduction 164 12.2 Fungiaspathogens 164 12.2.1 Adaptiveevolution:characterizingfunctionaldivergenceandassociated selectivepressurechanges 164 12.2.2 Host–pathogenevolution:detectingarmsracesthroughtheevolutionof R-genes,avirulencegenes,aswellasfungaleffectorsandelicitors 169 12.2.3 Lifestyle-associatedadaptations:fromsaprophytestopathogens 172 12.3 Fungiassymbionts:selectivepressuretomaintainsymbiosisin mycorrhizaeandlichens 173 12.4 Evolutionofcodonusageinfungalgenomes 173 12.4.1 Fungiaseukaryoticmodelsofcodonusageevolution 174 12.4.2 Codonmodelsappliedtodetectcodonbiasinfungi: translationalselection 175 12.4.3 Fungalpreferredcodonuses 176 12.5 Functionalshifts:measuringtheconcomitantvariationinselectivepressure 177 CONTENTS vii 12.6 Adaptiveevolutionofgeneexpression:wiringandre-wiring regulatorynetworks 177 12.7 Ancestralpolymorphisms:maintainingallelicvariantsforextendedperiods 178 12.8 TheoriginofsexualchromosomesinFungi:reducedselectionefficiencyand degenerativechangesinpreferredcodonusage 180 12.9 Findinggenesassociatedwithspecializationandspeciation 180 12.10 Conclusion:newusesofcodonmodelsforanalysingfungalgenomes 181 PartII:Codonusagebias 13: Measuringcodonusagebias 189 AlexanderRoth,MariaAnisimova,andGinaM.Cannarozzi 13.1 Introduction 189 13.2 Causesofcodonusagebias 189 13.2.1 Mutationalbiasesaffectingcodonusage 189 13.2.2 Selectionaffectingcodonusage 190 13.3 Applicationsforindicesofcodonusagebias 192 13.4 Previousstudiesofcodonusageindices 192 13.5 Measuresofcodonbias 193 13.5.1 Relativecodonfrequencies 194 13.5.2 Measuresbasedonreference 194 13.5.3 Measuresbasedonthegeometricmean 196 13.5.4 Measuresbasedondeviationfromanexpecteddistribution 199 13.5.5 Measuresbasedoninformationtheory 200 13.5.6 MeasuresfocusingontRNAinteraction 201 13.5.7 Measuresbasedonintrinsicpropertiesofcodonusage 202 13.5.8 Measuresfortotalcodonusageingenomes 205 13.6 Dependenciesofmeasures 206 13.6.1 Dependenceonnucleotidecomposition 206 13.6.2 Dependenceongenelength 207 13.6.3 Dependenceonthedegreeofcodondegeneracy 207 13.6.4 Dependenceontheskewnessofsynonymouscodonusage 208 13.6.5 Dependenceonaminoaciddiscrepancy 208 13.7 Comparisonsusingbiologicaldata 210 13.7.1 Correlationwithtranscriptandproteinlevels 211 13.7.2 Correlationwithrateofproteinsynthesis 211 13.8 Limitationsofcodonusageindices 212 13.9 Conclusions 212 14: Detectionandanalysisofconservationatsynonymoussites 218 NimrodD.RubinsteinandTalPupko 14.1 Introductiontoconservation 218 14.2 Classicalviewregardingsynonymousmutationsasneutral 218 14.3 Conservationduetotranslationaloptimization 219 14.4 ConservationduetomRNAstructure 220 14.5 Conservationduetooverlappinggenes 222 14.6 Conservationtomaintainsplicingsignals 223 14.7 Applicationofcodonmodelstothedetectionofconservedsynonymoussites 223 14.8 Othercis-encodedelementsresponsibleforsynonymousconservation 224 14.9 Concludingremarks 225 viii CONTENTS 15: Distancemeasuresandmachinelearningapproachesforcodonusageanalyses 229 FranSupekandTomislavŠmuc 15.1 Causesofbiasedcodonusage 229 15.2 Methodsforquantifyingcodonbiases 231 15.2.1 Unsupervisedmethods 231 15.2.2 Supervisedmethods 234 15.3 Applicationtobacterialandarchaealgenomes 236 15.3.1 Rationalebehindusingclassifierstocontrolforbackgroundnucleotide composition 236 15.3.2 Anexampleapplicationofsupervisedmachinelearningincodonusage analysis 237 15.3.3 Proportionofgenomessubjecttotranslationalselectionandcorrelationswith genefunctionalcategories 239 15.3.4 Distributionofcodon-optimizedgeneswithinspecificgenefunctional categoriesandrelationshiptomicrobiallifestyle 240 15.3.5 mRNAexpressionlevelsandcodonpreferencesofgenessubjectto translationalselection 241 16: Theapplicationofpopulationgeneticsinthestudyofcodonusagebias 245 KaiZeng 16.1 Introduction 245 16.2 Theory 246 16.2.1 Thereversiblemutationmodelandtheinfinitesitesmodel 246 16.2.2 ParameterestimationanddatapreparationundertheRMmodel 247 16.2.3 ParameterestimationanddatapreparationundertheISmodel 249 16.3 Somerecenttheoreticaldevelopments 250 16.3.1 Methodsthattakeaccountoftheeffectsofrecentchangesofpopulationsize 250 16.3.2 Amulti-allelemodelwithreversiblemutation 252 16.3.3 Theeffectsoflinkageonparameterestimation 253 16.4 Conclusion 254 17: Structuralandmolecularfeaturesofnon-standardgeneticcodes 258 MariadoCéuSantosandManuelA.S.Santos 17.1 Overview 258 17.1.1 Geneticcodediversity:mitochondrialandnuclear 258 17.1.2 Neutralandnon-neutralmechanisms 260 17.2 Howarenon-neutralgeneticcodechangesselected? 261 17.2.1 Selenocysteine 261 17.2.2 Pyrrolysine 262 17.2.3 TheCUGcaseinCandidaspp. 264 17.3 Cellularandmolecularconsequencesofnon-neutralgeneticcodealterations 265 17.3.1 Consequencesatproteomelevel 265 17.3.2 Consequencesatgenomelevel 266 17.3.3 Consequencesatphenotypiclevel 267 17.4 Conclusionsandperspectives 268 Index 273 Foreword The fundamental biological insights necessary to a suite of models existed, the most complex of inspiremathematicalmodellingofcodonevolution which essentially allowed for any pattern of DNA in protein-coding genes became available 50 years sequence evolution to be modelled by a suitable ago,whenCrick,Benner,Barnett,andWatts-Tobin choiceofparameters. confirmed the triplet nature of the genetic code in Forprotein-codinggenes,codonsarethenatural 1961. Presumably because the sequences that first leveltostudytheevolutionaryprocess,astheyper- becameavailablewereproteins,thefirstevolution- mitconsiderationofbothmutationprocessesatthe arymodelsdescribedaminoacidreplacements.In DNA level and natural selection on the protein. aseriesofpioneeringstudiesstartingin1966,Day- A Markov chain model of codon evolution was hoffandcolleaguesappliedanexplicitlyevolution- describedasearlyas1975byJorréandCurnow,to ary approach to summarizing changes in protein predictaminoacidfrequenciesintheprotein.Sadly, sequencesempirically.ThePAMmatrixhighlights this found no applications and seems largely for- twomajorfactorsaffectingtheaminoacidreplace- gotten. Increasing interest in using sequence data mentsthataccumulateoverevolutionarytime:the to study selection led to consideration of codon mutational distance as determined by the genetic evolution by Miyata and Yasunaga, and by Gojo- code (that is, amino acids that can be reached by bori in the early 1980s. However, inference under asinglenucleotidemutationreplaceeachotherfar codon models requires working with a 61-letter more often than those that are separated by two alphabetofcodons,whichmeansroughlya(61/4)3 or three positional differences), and the physico- or 3500-fold increase in computational time. The chemical distance (that is, similar amino acids codon-modellingproblemseemedtoobig,tooslow, replace each other for more often than dissimilar andsimplytoodauntinguntilanotherdecadehad aminoacids).Thematrixbecameaninstantclassic passed. andisstillwidelyused,alongsidemorerecentana- By1990,thescenewassetforcodonmodels.That loguesestimatedfromhugedatabasesorforpartic- year, Schöniger and colleagues demonstrated that ular groups of organisms or particular proteins or Dayhoff’s approach for proteins could equally be proteindomains. appliedtocodons.In1994,twopapersauthoredby In parallel to this modelling based on the Muse and Gaut, and by us, coincidentally appear- 20-letter alphabet of amino acids, modelling of ing on consecutive pages of the journal Molecular DNA sequence evolution based on the four-letter Biology and Evolution, implemented codon models nucleotidealphabetwasinitiatedbyJukesandCan- inaphylogeneticframework.Bothtookaparamet- tor in 1969. Here, however, a parametric or mech- ric approach, describing possible codon changes anisticapproachwas used.Earlymodelsassumed usingasmallnumberofparameters.Inthecaseof that all replacements occur at the same rate and ourowncontribution,wehopedafewparameters only gradually was greater parametric complex- would be enough to capture the major features ity introduced, inspired by observations such as of amino acid replacements: in particular, the unequal nucleotide frequencies or unequal tran- mutationalandphysico-chemicaldistancesthatare sition and transversion rates. By the mid-1980s importantinaffectingtherelativereplacementrates ix

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.