This page intentionally left blank The Phylogenetic Handbook SecondEdition The Phylogenetic Handbook provides a comprehensive introduction to theory and practice of nucleotideandproteinphylogeneticanalysis.Thissecondeditionincludessevennewchapters, coveringtopicssuchasBayesianinference,treetopologytesting,andtheimpactofrecombination onphylogenies.Thebookhasastrongerfocusonhypothesistestingthanthepreviousedition, withmoreextensivediscussionsonrecombinationanalysis,detectingmolecularadaptationand genealogy-basedpopulationgenetics.Manychaptersincludeelaboratepracticalsections,which have been updated to introduce the reader to the most recent versions of sequence analysis andphylogenysoftware,includingBlast,FastA,Clustal,T-coffee,Muscle,Dambe,Tree-Puzzle, Phylip,Mega4,Paup*,Iqpnni,Consel,ModelTest,ProtTest,Paml,HyPhy,MrBayes,Beast,Lamarc, SplitsTree,andRdp3.Manyanalysistoolsaredescribedbytheiroriginalauthors,resultingin clearexplanationsthatconstituteanidealteachingguideforadvanced-levelundergraduateand graduatestudents. PhilippeLemeyisaFWOpostdoctoralresearcherattheRegaInstitute,KatholiekeUniversiteit Leuven,Belgium,wherehecompletedhisPh.D.inMedicalSciences.HehasbeenanEMBOFellow andaMarie-CurieFellowintheEvolutionaryBiologyGroupattheDepartmentofZoology, University of Oxford. His research focuses on molecular evolution of viruses by integrating molecularbiologyandcomputationalapproaches. MarcoSalemiisAssistantProfessorattheDepartmentofPathology,ImmunologyandLabo- ratoryMedicineoftheUniversityofFloridaSchoolofMedicine,Gainesville,USA.Hisresearch interests include molecular epidemiology, intra-host virus evolution, and the application of phylogenetic and population genetic methods to the study of human and simian pathogenic viruses. Anne-Mieke Vandamme isaFullProfessorintheMedicalFacultyattheKatholiekeUni- versiteit,Belgium,workinginthefieldofclinicalandepidemiologicalvirology.Herlaboratory investigatestreatmentresponsesinHIV-infectedpatientsandisrespectedforitsscientificand clinical contributions to virus–drug resistance. Her laboratory also studies the evolution and molecularepidemiologyofhumanvirusessuchasHIVandHTLV. The Phylogenetic Handbook A Practical Approach to Phylogenetic Analysis and Hypothesis Testing Second Edition Editedby Philippe Lemey KatholiekeUniversiteitLeuven,Belgium Marco Salemi UniversityofFlorida,Gainesville,USA Anne-Mieke Vandamme KatholiekeUniversiteitLeuven,Belgium CAMBRIDGEUNIVERSITYPRESS Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo, Delhi, Dubai, Tokyo Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521877107 © Cambridge University Press 2009 This publication is in copyright. Subject to statutory exception and to the provision of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published in print format 2009 ISBN-13 978-0-511-71963-9 eBook (NetLibrary) ISBN-13 978-0-521-87710-7 Hardback ISBN-13 978-0-521-73071-6 Paperback Cambridge University Press has no responsibility for the persistence or accuracy of urls for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate. Contents Listofcontributors pagexix Foreword xxiii Preface xxv Section I: Introduction 1 1 Basic concepts of molecular evolution 3 Anne-MiekeVandamme 1.1 Geneticinformation 3 1.2 Populationdynamics 9 1.3 Evolutionandspeciation 14 1.4 Datausedformolecularphylogenetics 16 1.5 Whatisaphylogenetictree? 19 1.6 Methodsforinferringphylogenetictrees 23 1.7 Isevolutionalwaystree-like? 28 Section II: Data preparation 31 2 Sequence databases and database searching 33 Theory 33 GuyBottu 2.1 Introduction 33 2.2 Sequencedatabases 35 2.2.1 Generalnucleicacidsequencedatabases 35 2.2.2 Generalproteinsequencedatabases 37 2.2.3 Specializedsequencedatabases,referencedatabases,and genomedatabases 39 2.3 Compositedatabases,databasemirroring,andsearchtools 39 2.3.1 Entrez 39 v vi Contents 2.3.2 SequenceRetrievalSystem(SRS) 43 2.3.3 Somegeneralconsiderationsaboutdatabasesearching bykeyword 44 2.4 Databasesearchingbysequencesimilarity 45 2.4.1 Optimalalignment 45 2.4.2 BasicLocalAlignmentSearchTool(Blast)4 7 2.4.3 FastA 50 2.4.4 Othertoolsandsomegeneralconsiderations 52 Practice 55 MarcVanRanstandPhilippeLemey 2.5 DatabasesearchingusingENTREZ 55 2.6 Blast 62 2.7 FastA 66 3 Multiple sequence alignment 68 Theory 68 DesHigginsandPhilippeLemey 3.1 Introduction 68 3.2 Theproblemofrepeats 68 3.3 Theproblemofsubstitutions 70 3.4 Theproblemofgaps 72 3.5 Pairwisesequencealignment 74 3.5.1 Dot-matrixsequencecomparison 74 3.5.2 Dynamicprogramming 75 3.6 Multiplealignmentalgorithms 79 3.6.1 Progressivealignment 80 3.6.2 Consistency-basedscoring 89 3.6.3 Iterativerefinementmethods 90 3.6.4 Geneticalgorithms 90 3.6.5 HiddenMarkovmodels 91 3.6.6 Otheralgorithms 91 3.7 Testingmultiplealignmentmethods 92 3.8 Whichprogramtochoose? 93 3.9 Nucleotidesequencesvs.aminoacidsequences 95 3.10 Visualizingalignmentsandmanualediting 96 Practice 100 DesHigginsandPhilippeLemey 3.11 Clustalalignment 100 3.11.1 Fileformatsandavailability 100 3.11.2 AligningtheprimateTrim5αaminoacidsequences 101 vii Contents 3.12 T-Coffeealignment 102 3.13 Musclealignment 102 3.14 ComparingalignmentsusingtheAltAVisTwebtool 103 3.15 Fromproteintonucleotidealignment 104 3.16 Editingandviewingmultiplealignments 105 3.17 Databasesofalignments 106 Section III: Phylogenetic inference 109 4 Genetic distances and nucleotide substitution models 111 Theory 111 KorbinianStrimmerandArndtvonHaeseler 4.1 Introduction 111 4.2 Observedandexpecteddistances 112 4.3 Numberofmutationsinagiventimeinterval*(optional) 113 4.4 NucleotidesubstitutionsasahomogeneousMarkovprocess 116 4.4.1 TheJukesandCantor(JC69)model 117 4.5 DerivationofMarkovProcess*(optional) 118 4.5.1 Inferringtheexpecteddistances 121 4.6 Nucleotidesubstitutionmodels 121 4.6.1 Rateheterogeneityamongsites 123 Practice 126 MarcoSalemi 4.7 Softwarepackages 126 4.8 Observedvs.estimatedgeneticdistances:theJC69model 128 4.9 Kimura2-parameters(K80)andF84geneticdistances 131 4.10 Morecomplexmodels 132 4.10.1 Modelingrateheterogeneityamongsites 133 4.11 EstimatingstandarderrorsusingMega4 135 4.12 Theproblemofsubstitutionsaturation 137 4.13 Choosingamongdifferentevolutionarymodels 140 5 Phylogenetic inference based on distance methods 142 Theory 142 YvesVandePeer 5.1 Introduction 142 5.2 Tree-inferencemethodsbasedongeneticdistances 144 5.2.1 Clusteranalysis(UPGMAandWPGMA) 144 5.2.2 Minimumevolutionandneighbor-joining 148 5.2.3 Otherdistancemethods 156 viii Contents 5.3 Evaluatingthereliabilityofinferredtrees 156 5.3.1 Bootstrapanalysis 157 5.3.2 Jackknifing 159 5.4 Conclusions 159 Practice 161 MarcoSalemi 5.5 Programstodisplayandmanipulatephylogenetictrees 161 5.6 Distance-basedphylogeneticinferenceinPhylip 162 5.7 InferringaNeighbor-Joiningtreefortheprimatesdataset 163 5.7.1 Outgrouprooting 168 5.8 InferringaFitch–MargoliashtreeforthemtDNAdataset 170 5.9 BootstrapanalysisusingPhylip 170 5.10 Impactofgeneticdistancesontreetopology:anexampleusing Mega4 174 5.11 Otherprograms 180 6 Phylogenetic inference using maximum likelihood methods 181 Theory 181 HeikoA.SchmidtandArndtvonHaeseler 6.1 Introduction 181 6.2 Theformalframework 184 6.2.1 Thesimplecase:maximum-likelihoodtreefor twosequences 184 6.2.2 Thecomplexcase 185 6.3 Computingtheprobabilityofanalignmentforafixedtree 186 6.3.1 Felsenstein’spruningalgorithm 188 6.4 Findingamaximum-likelihoodtree 189 6.4.1 Earlyheuristics 190 6.4.2 Full-treerearrangement 190 6.4.3 DNamlandfastDNAml 191 6.4.4 PhyMLandPhyMl-SPR 192 6.4.5 Iqpnni 192 6.4.6 RAxML 193 6.4.7 Simulatedannealing 193 6.4.8 Geneticalgorithms 194 6.5 Branchsupport 194 6.6 Thequartetpuzzlingalgorithm 195 6.6.1 Parameterestimation 195 6.6.2 MLstep 196 6.6.3 Puzzlingstep 196 6.6.4 Consensusstep 196 6.7 Likelihood-mappinganalysis 196
Description: