RESEARCH ARTICLES Identifying the Basal Angiosperm Node in Chloroplast Genome Phylogenies: Sampling One’s Way Out of the Felsenstein Zone JimLeebens-Mack,*LindaA.Raubeson,(cid:1)LiyingCui,*JenniferV.Kuehl,(cid:2)MatthewH.Fourcade,(cid:2) Timothy W. Chumley,§ Jeffrey L. Boore,(cid:2)k Robert K. Jansen,§ and Claude W. dePamphilis* *DepartmentofBiology,InstituteofMolecularEvolutionaryGenetics,andTheHuckInstitutesofLifeSciences,ThePennsylvania StateUniversity;(cid:1)DepartmentofBiologicalSciences,CentralWashingtonUniversity;(cid:2)DepartmentofEnergyJointGenomeInstitute andUniversityofCaliforniaLawrenceBerkeleyNationalLaboratory,WalnutCreek,California;§SectionofIntegrativeBiologyand Institute of Cellular andMolecular Biology,University ofTexas, Austin;andkDepartmentof Integrative Biology, University of California, Berkeley WhiletherehasbeenstrongsupportforAmborellaandNymphaeales(waterlilies)asbranchingfrombasal-mostnodesinthe angiospermphylogeny,thishypothesishasrecentlybeenchallengedbyphylogeneticanalysesof61protein-codinggenes extractedfromthechloroplastgenomesequencesofAmborella,Nymphaea,and12otheravailablelandplantchloroplast genomes.Thesecharacter-richanalysesplacedthemonocots,representedbythreegrasses(Poaceae),assistertoallother extantangiospermlineages.Wehaveextractedprotein-codingregionsfromdraftsequencesforsixadditionalchloroplast genomestotestwhetherthissurprisingresultcouldbeanartifactoflong-branchattractionduetolimitedtaxonsampling.The addedtaxaincludethreemonocots(Acorus,Yucca,andTypha),awaterlily(Nuphar),aranunculid(Ranunculus),and agymnosperm(Ginkgo).PhylogeneticanalysesoftheexpandedDNAandproteindatasetstogetherwithmicrostructural characters(indels)providedunambiguoussupportforAmborellaandtheNymphaealesasbranchingfromthebasal-most nodesintheangiospermphylogeny.However,theirrelativepositionsprovedtobedependentonthemethodofanalysis,with parsimonyfavoringAmborellaassistertoallotherangiospermsandmaximumlikelihood(ML)andneighbor-joiningmeth- odsfavoringanAmborella1Nymphaealescladeassister.TheMLphylogenysupportedthelaterhypothesis,butthelikeli- hood for the former hypothesis was not significantly different. Parametric bootstrap analysis, single-gene phylogenies, estimateddivergencedates,andconflictingindelcharactersallhelptoilluminatethenatureoftheconflictinresolution of the most basal nodes in the angiosperm phylogeny. Molecular dating analyses provided median age estimates of 161MYAforthemostrecentcommonancestor(MRCA)ofallextantangiospermsand145MYAfortheMRCAofmono- cots,magnoliids,andeudicots.Whereaslongsequencesreducevarianceinbranchlengthsandmoleculardatingestimates, theimpactofimprovedtaxonsamplingontherootingoftheangiospermphylogenytogetherwiththeresultsofparametric bootstrap analyses demonstrate how long-branch attraction might mislead genome-scale phylogenetic analyses. Introduction Characterized by Darwin (1903) as ‘‘an abominable clade, sister to all other angiosperm lineages (e.g., fig. 1B mystery,’’ the early radiation of angiosperms was pivotal and C). While the branching order of Amborella and the in the evolutionary history of our biota. After years of Nymphaeales relative to each other and the rest of the an- controversy concerning identity of the basal-most node giospermshasremainedcontroversial,therehasbeenwide- in the angiosperm phylogeny, a series of studies using spread consensus in the recent plant systematics literature multiple genes from the chloroplast, mitochondrial, and for Amborella and Nymphaeales branching off atthe base nuclear genomes identified Amborella, the Nymphaeales, of the angiosperm phylogeny, followed subsequently by and Austrobaileyales as successive sister lineages relative Austrobaileyales and the remaining angiosperm lineages. to all other angiosperms (Mathews and Donoghue 1999; Thatconsensuswasrecentlychallengedbytheresults Parkinson, Adams, and Palmer 1999; Qiu et al. 1999; of phylogenetic analyses of 61 protein-coding genes com- P. S. Soltis, D. E. Soltis, and Chase 1999; Barkman monto14chloroplastgenomesequencesincludingthere- et al. 2000; Graham and Olmstead 2000; Zanis et al. centlysequencedplastidgenomesofAmborellatrichopoda 2002;Borschetal.2003;Hiluetal.2003;Stefanovic´,Rice, (Goremykin et al. 2003) and Nymphaea alba (Goremykin andPalmer2004;e.g.,fig.1A).Manyofthesestudiesfound et al. 2004). The analyses of Goremykin and colleagues that the inferred relationship between Amborella and the placed the monocots, represented by the chloroplast ge- Nymphaealesvariedwhenusingdifferingmethodsofphy- nomesofrice(Oryzasativa),maize(Zeamays),andwheat logenetic reconstruction, models of molecular evolution, (Triticumaestivum),assistertoallotherextantlineagesin and subsets of taxa (Barkman et al. 2000; Graham and the angiosperm phylogeny. This result is quite intriguing Olmstead 2000; Zanis et al. 2002; Stefanovic´, Rice, and becausemuchofourcurrentunderstandingofmorpholog- Palmer 2004), with each lineage sometimes inferred to ical, developmental, and molecular evolution in early an- bemostbasaland,insomecases,forthetwotoformasingle giosperm history would have to change if monocots are in fact sister to all other angiosperms (e.g., fig 1D). As Goremykin et al. (2004, p. 1452) point out, however, the Key words: angiosperm phylogeny, phylogenomics, parametric hypothesized basal divergence of monocots from all other bootstrap. angiosperms must be tested in analyses with increased E-mail:[email protected]. taxonsampling.Genome-scalephylogeneticstudies,where Mol.Biol.Evol.22(10):1948–1963.2005 charactersamplingisverydeepbuttaxonsamplingistyp- doi:10.1093/molbev/msi191 AdvanceAccesspublicationJune8,2005 ically sparse, are particularly susceptible to long-branch PublishedbyOxfordUniversityPress2005. ChloroplastGenomePhylogenies 1949 FIG.1.—Nucleotide-basedMLphylogeniesestimatedusingtheHKY1C1IsubstitutionmodelwhileconstrainingAmborella(A),Amborella1 waterlilies(B),waterlilies(C),ormonocots(D)assistertoremainingangiosperms.Likelihoodforeachhypothesisisshownwitheachphylogeny. attraction(Soltisetal.2004b;Philippe,Lartillot,andBrink- plificationoftheentireplastidgenomeusingplastidisola- mann2005),andmultiplelinesofevidencesuggestthatthe tions (Acorus and Typha), and (3) cpDNA-containing placementofthegrassesintheanalysesofGoremykinetal. clones identified through screening of a fosmid genomic (2003, 2004) may be an artifact of sparse taxon sampling, library(Yucca).Thelow-coverage(,0.53)Yuccafosmid particularlywithinthemonocots(D.E.SoltisandP.S.Sol- library was produced using the Epicentre CopyControlTM tis 2004; Soltis etal. 2004b;Stefanovic´, Rice, and Palmer FosmidLibraryProductionKitaccordingtomanufacturer’s 2004;butseeLockhartandPenny2005;Martinetal.2005). protocols (http://www.epibio.com). All templates were Here we test this hypothesis directly by adding the corre- sheared by serial passage through a narrow apertureusing sponding 61-gene data sets for six additional species, in- a Hydroshear(cid:1) device (Genomic Solutions, Ann Arbor, cluding three nongrass monocot species, a ranunculid Mich.) into random fragments of approximately 3 kb, (the sister clade to all other eudicots), an additional water and then cloned into the pUC18 plasmid vector to create lily, and the gymnosperm Ginkgo biloba. Our study also a clone library. These clones were robotically processed includesanevaluationofmicrostructuralmutations,aclass through colony picking, rolling circle amplification using of evolutionary events that may be less subject to homo- TempliPhiTM (Amersham Biosciences, Little Chalfont, plastic events than base substitutions. In addition, we ex- UK) sequencing reactions with either ET terminators tend the earlier work of Zanis et al. (2002) using (AmershamBiosciences)orBigDyeTMterminatorsApplied parametric bootstrap analyses to investigate estimated Biosystems,FosterCity,Calif.).andthenreadsofapprox- branching events at the base of the angiosperms and esti- imately700nteachweredeterminedfromeachendofeach mateminimumdivergencedatesforwell-supportedclades clone. Detailed protocols are available at http://www.jgi. on the phylogeny. doe.gov/sequencing/protocols/prots_production.html. Se- quence reads were trimmed and assembled using phred and phrap (Ewing and Green 1998; Ewing et al. 1998) Materials and Methods and manually interpreted using Consed (Gordon, Abajian, Sequencing and Green 1998) and Sequencher (http://www.genecodes. Templates suitable for constructing random insert com). Roughly 4,000 sequencing reads were determined plasmid libraries were generated for the chloroplast for each chloroplast genome, giving about 2.8 million nu- genomesofG.biloba,Nupharadvena,Acorusamericanus, cleotidesofsequence(atQ20orgreaterquality),foranav- Yuccaschidigera,Typhalatifolia,andRanunculusmacran- erage depth of coverage in the assemblies of about 123 thus by one of the three ways (Jansen et al. 2005): (1) su- (considering losses due to impure cpDNA preparations). crosegradientisolationofpurechloroplastDNA(cpDNA) ThesequenceofeachcpDNAwasfinishedtomeetthequal- (Ginkgo, Nuphar, and Ranunculus), (2) rolling circle am- ity criteria specified in Jansen et al. (2005) by additional 1950 Leebens-Macketal. Table1 GenBankAccession Numbersforthe Sequences Included inThis Study Taxon(Collectionlocale:voucherID) GenBankAccessionNumbers Reference Bryophytes Anthocerosformosae NC_004543 Kugitaetal.2003b Physcomitrellapatens NC_005087 Sugiuraetal.2003 Marchantiapolymorpha NC_001319 Ohyamaetal.1988 Fernsandallies Psilotumnudum NC_003386 Wakasugietal.,1998 Adiantumcapillus-veneris NC_004766 Wolfetal.2003 Gymnosperms Pinusthunbergii NC_001631 Wakasugietal.1994 Ginkgobiloba(TravisCo., DQ069337–DQ069702 Currentstudy Tex.;RCH155TEX) Basal-mostangiospermlineagesa Amborellatrichopoda NC_005086 Goremykinetal.2003 Nymphaeaalba NC_006050 Goremykinetal.2004 Nupharadvena(CentreCo.,Pa.; DQ069337–DQ069702 Currentstudy PAC95537PAC) Monocots Acorusamericanus(CrawfordCo., DQ069337–DQ069702 Currentstudy Pa.;PAC95538PAC) Typhalatifolia(YavapaiCo.,Ariz.; DQ069337–DQ069702 Currentstudy RCH188TEX) Yuccaschidigera(SanDiegoCo., DQ069337–DQ069702 Currentstudy Calif.;jlm-yuc370PAC) Saccharumofficinarum NC_006084 Asanoetal.2004 Zeamays NC_001666 Maieretal.1995 Oryzasativa NC_001320 Hiratsukaetal.1989 Triticumaestivum NC_002762 Ogiharaetal.2000 Magnoliids Calycanthusfloridus NC_004993 Goremykinetal.2003 Eudicots Ranunculusmacranthus(TravisCo., DQ069337–DQ069702 Currentstudy Tex.;RCH1184TEX) Nicotianatabacum NC_001879 Shinozakietal.1986 Atropabelladonna NC_004561 Schmitz-Linneweberetal.2002 Spinaciaoleracea NC_002202 Schmitz-Linneweberetal.2001 Lotuscorniculatus NC_002694 Katoetal.2000 Medicagotruncatula NC_003119 LinS.,H.Wu,H.Jia,P.Zhang,R. Dixon,G.May,R.Gonzales,and B.A.Roe,unpublisheddata Arabidopsisthaliana NC_000932 Satoetal.1999 Oenotheraelata NC_002693 Hupferetal.2000 a Basalangiospermlineagesasdeterminedinmostmolecularsystematicstudiessince1999(seetext). sequencing reactions that targeted gaps or parts of the ge- The complete amino acid and nucleotide alignments are nomes with potential errors using primers specifically de- available in our Chloroplast Genome Database (http:// signed for targeted portions of the plastid genome. chloroplast.cbio.psu.edu/), and sequences are available in GenBank (accession numbers DQ069337–DQ069702; Phylogenetic Analyses table 1). Phylogeneticanalysesofnucleotidealignmentsusing The 61 genes included in the analyses of Goremykin maximum parsimony (MP) and Neighbor-Joining (NJ) etal.(2003,2004)wereextractedfromhigh-qualitycontigs were performed using PAUP* version 4.10 (Swofford (Qvalues.40forallextractedbases)ofoursixnewchlo- 2003), and maximum likelihood (ML) analyses were per- roplastgenomesequencesusingtheorganellargenomean- formed using both PAUP* and PHYML version 2.4.4 notation program DOGMA (http://evogen.jgi-psf.org; (GuindonandGascuel2003).AllMPsearcheswereheuristic Wyman,Jansen,andBoore2004).Thesamesetof61genes was extracted from chloroplast genome sequences for the with 10 random addition replicates and Tree Bisection- 18 other available species (table 1). Inferred amino acid Reconnection branch swapping. The Hasegawa-Kishino- sequencesforeachofthe61geneswerealignedusingClus- Yano(HKY)(Hasegawa,Kishino,andYano1985)model talW(Thompson,Higgins,andGibson1994)andadjusted ofmolecularevolutionwasusedinMLandNJanalysesof manually. A nucleotide alignment was then forced to cor- thenucleotidealignments.MLestimatesofHKYdistances respond to the amino acid alignment and further adjusted. were used for the NJ analyses. ML and NJ analyses of ChloroplastGenomePhylogenies 1951 aminoacidalignmentswereperformedusingPHYMLand PHYLIPversion3.63(seqboot,protdist,andneighbor;Fel- senstein 2004), respectively, under the Jones-Taylor- Thornton (JTT) model (Jones, Taylor, and Thornton 1992). ML and NJ analyses of the nucleotide alignment were run with and without rate variation among sites (HKY 1 C), with and without invariant sites (HKY 1 C 1I).MLandNJanalysesoftheaminoacidalignmentwere runwithandwithoutratevariationamongsites(JTT1C), andMLanalyseswerealsorunwithandwithoutinvariant sites (JTT 1 C 1 I). Rate variation among sites was esti- mated as a discrete gamma distribution (Yang 1994) with six rate classes. ML parameter estimates for rate variation acrosssitesandforinvariantsiteswereoptimizedsimulta- neously with topology and branch lengths in PHYML. In order toavoid being trappedin local optima, all ML anal- FIG.2.—PairwiseHKYdistancesestimatedforthirdcodonpositions are linearly correlated with distances estimated for first and second ysesperformedinPHYMLwererunwithfivestartingtrees positions. including the BIONJ tree (default) and the four trees showninfigure1.Likelihoodratiotestsshowedsignificant ment.Anexceptionwasmadeforwell-alignedsitesinrpoA improvement in the fits of both nucleotide and protein andccsA,whicharemissingfromtheplastomeofPhysco- evolution models with the addition of parameters for rate mitrella patens and were coded as gap for this taxon. The variation across sites and invariant sites (P ,, 0.001). resulting nucleotide and amino acid alignments included Non-parametricbootstrapanalyses(Felsenstein1985)were 39,978 and 13,326 characters, respectively. Exclusion of performed for all analyses with 200 pseudoreplicates. Adiantum, Anthocerous, or gapped sites affected only the In addition to the typical ML analysis, constrained poorly supported placement of Calycanthus relative to likelihoodtreeswereestimatedforthenucleotidealignment the eudicots in MP, NJ, and ML topologies (data not settingoneoffourlineagesassistertotheremainingangio- shown). sperms: Amborella (fig. 1A), Amborella 1 Nymphaeales Ane´ etal.(2005)haverecentlyshownamong-lineage (fig. 1B), Nymphaeales (fig. 1C), and monocots (fig. 1D; rate heterogeneity, or heterotachy (Lopez, Casane, and Goremykin et al. 2003, 2004). The Shimodaira-Hasegawa Philippe 2002), at many sites in plant plastid genes. This (SH)(ShimodairaandHasegawa1999)testwasperformed type of rate variation can confound phylogenetic analyses in PAUP* to determine whether any of the resulting phy- given some patterns of variation (e.g., Kolaczkowski and logenies were significantly worse than the ML tree. The Thornton 2004; Spencer, Susko, and Roger 2005). We test was performed using 10,000 bootstrap replicates. attempted to control for heterotachy in distance analyses MP topologies were tested similarly in PAUP* using the ofbothnucleotideandaminoacidalignmentsbyestimating Wilcoxon test. LogDet distances (Lake 1994; Lockhart et al. 1994; Steel In their analyses, Goremykin et al. (2003, 2004) re- 1994)withandwithoutamong-siterateheterogeneityusing moved third codon positions from their nucleotide align- LDDist (Thollesson 2004; Martin et al. 2005). ments citing concerns over saturation of nucleotide Parsimony analyses were also performed on all 61 substitutionsatsynonymoussites.Pairwisegeneticdistan- genes individually. Analyses were performed on align- ces for all taxon pairs in our analysis (HKY model) were ments of all three codon positions, first and second codon calculatedseparatelyforcodonpositions112andposition positions and amino acids. Gapped positions were not in- 3. A linear relationship was observed between genetic di- cludedintheanalyses.Bootstrapanalyseswereperformed vergence (HKY distances) at third versus first and second with250replicates,andalltheresultingphylogenieswere positions (fig. 2), suggesting that saturation was not seri- inspectedtoidentify themostbasalangiospermlineagein ouslybiasingdistanceestimates,sothirdpositionswerein- cases where one clade was identified as sister to all other cludedinourphylogeneticanalyses.Theinclusionofthird angiosperms with at least 50% bootstrap support. codonpositionschangedbootstrapvaluesforsomenodes, Whereasalignmentinmanygappedregionswasprob- butwiththeexceptionofthepoorlysupportedplacementof lematic, there were many regions where the homology of Calycanthus,theoptimalMP,ML,andNJtopologieswere insertions or deletions could be assigned unambiguously. thesamein61-geneanalyseswithandwithouttheirinclu- Aseparatedatamatrixofinsertionsanddeletionswascon- sion (see Results). structed from these regions, and parsimony analysis was Initial analyses of codon usage, amino acid content, performed on the binary data matrix. and branch lengths showed that the fern Adiantum and hornwort Anthoceros sequences are extremely divergent Parametric Bootstrap Analyses from their closest relatives in the study, in part due to ex- tensiveRNAediting(Kugitaetal.2003b;Wolfetal.2003). WhilemuchhasbeenmadeofAmborellaasthemost Thesetaxa,whichwerenotcriticaltoourstudy,wereelim- basal clade in the angiosperm phylogeny (Mathews and inated from further analyses. Gapped sites were also ex- Donoghue 1999; Parkinson, Adams, and Palmer 1999; cluded from additional analyses because these often Qiu et al. 1999; P. S. Soltis, D. E. Soltis, and Chase represented regions of questionable annotation and align- 1999; Barkman et al. 2000; Graham and Olmstead 2000; 1952 Leebens-Macketal. Zanisetal.2002;Borschetal.2003;Hiluetal.2003),some MRCA defined by the nodes in the ML tree across all analysesinthesestudiesandothershavefoundsupportfor 200 bootstrap trees. the placement of Amborella and Nymphaeales in a clade placedsistertoallotherangiosperms,whereasothershave Results shown evidence for Nymphaeales alone as the most basal Monocots Are not Sister to All Other cladeintheangiospermphylogeny.Barkmanetal.(2000) Extant Angiosperms first showed that the placement of Amborella relative to Nymphaeales could vary among data sets and methods AsFelsenstein(1978)famouslydeduced,thereisaten- ofphylogeneticreconstructionandpredictedthatincongru- dency for long branchestoartifactually attract (referredto enceamongphylogeniesestimatedusingdifferentmethods as ‘‘long-branch attraction’’) in some phylogenetic analy- wouldincreaseasadditionalsequencedatawereaddedfor ses,andundertheseconditions,theprobabilityofinferring each taxon in genome-scale analyses due to the effects of thewrongphylogenywillincreasewithadditionaldatadue long-branchattraction(fig.5ofBarkmanetal.2000).Fol- to statistical inconsistency. Hendy and Penny (1989) lowing the comprehensive investigation of Zanis et al. showed that even when rates of evolution are constant (2002), we used parametric bootstrap analyses to explore amonglineages,long-branchattractionmayconfoundpar- whether long-branch attraction might be responsible for simony analyses with more than four taxa and trees with the observed conflict and determine if any of the four hy- variable terminal branch lengths. Because the out-group potheses illustrated in figure 1 could be rejected given the is almost always a long branch, this often manifests as largenumberofchloroplastgenomesequencesinourstudy. the longer branch in-group taxa being drawn to the base ML phylogenies were estimated in PAUP* with of the tree even when this is not the correct relationship constrainttreescorrespondingtooneoffourhypothesized (PhilippeandLaurent1998).Insuchinstances,addingtaxa lineages as sister to all other extant angiosperms: (1) tointerruptthelongestbranchescanhelpparsimonytocon- Amborella, (2) an Amborella 1 Nymphaeales clade, (3) verge on the correct phylogeny (Hendy and Penny 1989). Nymphaeales,and(4)monocots(fig.1).Parametervalues By adding nongrass monocots and another gymno- fortheHKYmodelwithinvariantsitesandamong-siterate sperm to the analysis, we interrupted two of the longest variation were estimated in PHYML as described above. branches (fig. 3). As a result, all MP and ML analyses These parameter values and each of the four ML phylog- placed branching points for Amborella and Nymphaeales eniesshowninfigure1wereusedtogenerate200simulated at the base of the angiosperm phylogeny with strong sup- data sets with Seq-Gen version 1.3 (Rambaut and Grassly port(fig.4).Themostparsimoniousnucleotide-basedphy- 1997). MP and NJ analyses were performed on all simu- logeny constrained to place monocots sister to all other lateddatasetsinPAUP*,andMLanalyseswereperformed angiosperms was 242 steps longer than the unconstrained using PHYML with five starting trees as described above. MP phylogeny (fig. 4B). A Wilcoxon signed ranks test Model parameters for the ML and NJ analyses matched showed this difference to be highly significant (P , thoseusedtosimulatethedata.Thefrequencieswithwhich 0.0001).MLanalyseswithamong-siteratevariationplaced hypothesis a, b, c, or d was observed in the estimated Amborella 1Nymphaealessister toallotherangiosperms phylogenies were calculated for each set of simulations (fig. 4A), whereas the ML analysis without rate variation and each phylogenetic methodology. placed Amborella as sister to all extant angiosperms (sup- plementary fig. 1A and B, see Supplementary Material). Thelog-likelihoodscorefortheoptimalHKY-C1Ilikeli- Dating Nodes hoodtreeinananalysisconstrainedtoplacemonocotssis- Ifsomespeciationeventsthatgaverisetobasalangio- ter to all other angiosperms ((cid:1)ln(L) 5 318,423.61) was spermlineageswereseparatedbyjustafewmillionyears,it 310.73 points greater than the score for the unconstrained maybedifficulttoresolvetheseeventsevenwiththelarge MLtree((cid:1)ln(L)5318,112.88;figs.1Band4A).Basedon amount of data included in genome-scale studies with ad- theShimodairaandHasegawa(S-H;1999)test,thisdiffer- equatetaxonsampling.Inordertoexploretimingofmajor ence in likelihood scores was highly significant (P , events in angiosperm history, the ML phylogenies gener- 0.0001). The NJ analyses with among-site rate variation ated for each of the 200 nonparametric replicates were found strong support for an Amborella 1 Nymphaeales savedwithbranchlengthsestimatedtoeightsignificantdig- clade sister to all other angiosperms (fig. 4C), but the NJ its, and minimum ages were estimated for nodes on each analysisunderthesimplermodelwithoutvariationinrates phylogeny using the penalized likelihood method imple- among sites placed the eudicots as sister to the remaining mentedinr8s(Sanderson2003).Theoriginoftheeudicots angiosperms (supplementary fig. 1C, see Supplementary 125MYAasevidencedbytheappearanceoftricolpatepol- Material). NJ analyses with the simple HKY model per- len in the fossil record (Crane, Friis, and Pederson 1995; formed after removing the most distant out-group taxa, Sanderson et al. 2004) was used as the fixed calibration Physcomitrella, Marchantia, and Psilotum, placed an Am- point.Inaddition,minimumandmaximumagesfortheor- borella 1 water lilies clade as sister to the remaining an- igin of theeuphyllophytes were set at380and 410 MYA, giosperms (supplementary fig. 1D, see Supplementary respectively(Schneideretal.2004),andminimumagesfor Material).MLandNJanalysesperformedonthenucleotide thePoaceaeandthemostrecentcommonancestor(MRCA) datamatrixusingthemorecomplicatedGTR-C1Imodel of Ginkgo and Pinus were 55 MYA (Kellogg 2000) and gavethesametopologiesshowninfigure4withbootstrap 310MYA(Schneideretal.2004),respectively.Averages, values62%relativetotheresultsoftheHKY-C1Ianal- medians, and standard errors were estimated for the ysis (data not shown). ChloroplastGenomePhylogenies 1953 FIG.3.—ComparisonofunrootedMPphylogeniesestimatedfromaminoacidalignmentswiththetaxonsetanalyzedbyGoremykinetal.(2004,A) andinthisstudy(B)revealsthelongbranchleadingtograsses. Analyses of the amino acids and first and second co- respectively, see Supplementary Material).NJ analyses of don positions also placed Amborella and Nymphaeales at the first and second positions gave the same topology as thebaseoftheangiospermphylogenywithstrongsupport thethirdcodonpositionanalysis,buttheaminoacidanalysis in the MP and ML analyses (supplementary figs. 2 and 3, placedeudicotsassistertotheremainingangiospermsgiving FIG.4.—SingleMP(A),ML(B),andNJ(C)phylogeniesestimatedfromthe61-genenucleotidealignmentareshown.Mostnodesoneachphylogeny wererecoveredin100%ofthebootstrapreplicates,andonlyvalues,100%areshownforeachnode.AllanalysesplaceAmborellaandthewaterliliesas basallineagesintheangiospermphylogeny. 1954 Leebens-Macketal. weak support for a (monocots (Calycanthus, (Amborella, Amborella or Amborella Plus Nymphaeales? Nymphaeales))) clade (supplementary figs. 2 and 3, re- As has been described in previous studies (Barkman spectively, see Supplementary Material). After removing etal.2000;Zanisetal.2002;Stefanovic´,Rice,andPalmer Physcomitrella,Marchantia,andPsilotumfromtheamino 2004), the relationship between Amborella and the Nym- acid alignment, NJ analyses recovered an Amborella phaeales relative to the rest of the angiosperms depended 1waterliliescladeassistertoallotherangiosperms.The onthemethodofanalysis.InourMPanalyses,Amborella placement of Calycanthus, which was poorly supported wasplacedsistertoallotherextantangiospermswithhigh inallbuttwoNJanalyses,differedamongallthreephylo- support(fig.4B).AtreeconstrainedtohaveaNymphaeales genetic methods in nucleotide analyses (fig. 4). Whereas 1 Amborella clade was 83 steps longer than the MP tree, bootstrap support for the placement of Calycanthus was andthisdifferencewasfoundtobesignificantinaWilcoxon (cid:2)65%inallMPandMLanalyses,Calycanthuswasplaced signed ranks test (P 5 0.0007). Examination of the char- sister to the monocots with support values of 100% and acters potentially supporting an Amborella alone versus 90% in NJ analyses of the ungapped nucleotide and first aNymphaeales1Amborellacladeshowedthatcharacters and second codon position alignments, respectively. were distributed evenly across the 61-gene alignment. DespitehighsupportvaluesintheseNJanalyses,additional However, despite strong support for the basal branching sequencessampledfrommagnoliidordersotherthanLaur- point for Amborella under MP, the NJ and ML analyses ales(Magnoliales,Piperales,andCanellales),amemberof uniteNymphaealeswithAmborellainacladethatissister the Chloranthaceae, and Ceratophyllum will be necessary totherestoftheangiosperms.TheMLanalysisgavemod- tohaveanyhopeofresolvingtherelationshipsamongthese eratesupportforthisrelationship(fig.4B),whereastheNJ taxa, the monocots and eudicots, with confidence. analysis provided strong support (fig. 3C). The likelihood NJ analyses of LogDet distance matrices estimated from nucleotide and amino acid alignments assuming no score((cid:1)ln(L)5318,117.04)foratreeconstrainedtoplace Amborellaassistertoallotherangiospermswasnotsignif- among-site rate variation, six rate classes, and invariant icantlydifferentthanthatoftheMLtree(P50.601;scores sites(MLestimateof26%)plussixadditionalrateclasses all produced the same topology placing the core eudicots shown in fig. 1). Furthermore, ML analyses of the amino sistertoacladewithallothereudicotsincludingtheeudicot acid alignments provided weak support for Amborella as Ranunculus (supplementary fig. 4, see Supplementary sister to the remaining angiosperms (supplementary Material).Themonophylyoftheeudicotsincludingthera- fig. 3, see Supplementary Material). The tree placing the nunculids as sister to the core eudicots is well established waterliliesassistertotherestoftheangiospermsgaveasig- (Judd and Olmstead 2004), so this topology is quite un- nificantly worse likelihood score than the ML phylogeny likelytorepresentthetruephylogeny.Moreover,analyses (P 5 0.028). of nucleotide alignments including only seed plants gave Most single-gene analyses did not resolve basal rela- strongsupportforanAmborella1waterliliescladeassis- tionshipsamongangiospermlineages(table2).Inallchar- ter to the remaining angiosperms (supplementary fig. 4E acter sets, however, when one angiosperm lineage was andF,seeSupplementaryMaterial).Theaminoacidanal- placed sister to the others, it was most often Amborella ysisoftheseedplants,however,placedeudicots(including orAmborella1Nymphaeales(table2).Themonocotswere Ranunculus) sister to the remaining angiosperms (supple- notresolvedassistertotheotherangiospermsinanyofthe mentaryfig.4GandH,seeSupplementaryMaterial).These single-genephylogenies,andPoaceaeorPoales(Poaceae1 results are noteworthy as they add to a growing set of ex- Typha)wereestimatedasthemostbasalcladeforonlyfour amplesunderscoringtheneedfordeeperunderstandingof genes in analyses of codon positions 1 and 2. A few of howcovarion/covariotideevolutionandotherformsofhet- the single-gene analyses supported relationships that are erotachy can be diagnosed and modeled in phylogenetic clearly untenable (table 2). For example, a weakly sup- analyses (e.g., Lockhart et al. 1998, 1999; Huelsenbeck ported Poaceae 1 Oenothera clade was placed sister to 2002; Lopez, Casane, and Philippe 2002; Kolaczkowski therestoftheangiospermsintheanalysisofclpPnucleo- and Thornton 2004; Phillips, Delsuc, and Penny 2004; tides. The most basal angiosperm lineage inferred for Ane´ et al. 2005; Martin et al. 2005; Spencer, Susko, and some genes varied across the three character sets. For ex- Roger 2005). ample, the psbB phylogenies estimated from all nucleoti- Ingeneral,therelationshipscommontoalltreesinfig- des, first and second codon positions and amino acids ure4werefoundinallMLtreesandmostNJtreesinferred were core eudicots, Poales, and Typha, respectively. fromnucleotideandaminoacidalignmentsgivenavariety TheMPanalysisofinsertionsanddeletionsalsopro- of substitution models. Most of the exceptions seen in the vided strong support for Amborella and Nymphaeales as NJtreeswereonlyfoundinanalysesthatincludeddistantly branching from the most basal nodes in the angiosperm related out-group taxa. Most importantly, conflicts among phylogeny (fig. 5 and supplementary tables 1 and 2, see topologies derived using different phylogenetic methods, SupplementaryMaterial).Atotalof116potentiallyphylo- substitution models, or taxon sets were usually supported geneticallyinformativeindelswereincludedintheanalysis. by bootstrap values greater than 90%. This observation Fourunambiguousindels,consistingofthreeinsertionsand underscoreshowstatisticalinconsistencyisaseriousprob- onedeletionmutation,supportedthemonophylyofallsam- lemforphylogenomicsthatcanonlybediagnosedthrough pled angiosperms except Amborella and Nymphaeales comprehensive analyses using a variety of phylogenetic (fig. 6). No indel characters were potentially supportive methods, substitution models, distance corrections, and ofgrassesormonocotsinthebasal-mostposition.Twelve taxon sets (see also Phillips, Delsuc, and Penny 2004). equally parsimonious trees (141 steps) were recovered in ChloroplastGenomePhylogenies 1955 Table2 The Lineage(s) ThatIs Inferred to Be MostBasal inAngiosperms Was Ambiguous inMost Single-Gene Analyses Taxon1 CodonPositions11213 CodonPositions112 AminoAcids Poaceae1Oenothera 1(clpP[80]) Poaceae 3(rpoB[50],psbK[54], 2(rpoC1[86] rps3[53]) rps12[64]) Typha 1(psbB[53]) Poales(Typha1Poaceae) 1(psbB[53]) Eudicots 1(rps2[82]) CoreEudicots 2(psbB[53],psbD[54]) Oenothera 1(rps2[50]) 1(rps2[69]) Spinacia 1(petD[53]) 1(rps15[66]) Amborella 5(atpE[61],atpF[56],psaA[59], 3(atpE[58],rpoA[64], 2(atpE[57], rbcL[59],rpoA[86]) cemA[68]) cemA[64]) Nymphaeales 1(rpoB[69]) Nymphaeales1Amborella 3(rpoC1[66],rpoC2[74], 2(rpoC2[73],matK[51]) 1(ccsA[53]) rps4[65]) Unresolved 48 50 54 Total 61 61 61 NOTE.—Theidentitiesofbasallineagesinferredwithatleast50%bootstrapsupportintheMPanalysesareshownforthe nucleotideandaminoacidalignmentswithallgappedpositionsremoved.Bootstrapsupport(percent)forbasalpositionindicated lineageisshowninsquarebracketsforeachgene. theMPanalysisoftheindelcharacters.HalfoftheMPtrees processes may have given rise to the enhanced rates of included an Amborella 1 Nymphaeales clade, and half nucleotide substitution and indel evolution in the lineage placed Amborella alone as sister to all other angiosperm leading to the Poaceae, and perhaps generally throughout lineages. Each of these two hypotheses was supported by the major lineages of land plant evolution. a single synapomorphy (fig. 6). The resolved portions of the indel phylogeny were identical to corresponding rela- Parametric Bootstrap Analysis tionships inferred through comparisons of nucleotide and amino acid sequences. The parametric bootstrap analysis demonstrated that A large number of the microstructural mutations dis- the results of the MP analysis may have been affected tinguishgrassesfromallotherplantsinthestudy(fig.5B), by long-branch attraction (Felsenstein 1978; Hendy and whereasotherlineageshavemanyfewerinsertion-deletion Penny1989). WhereastheMLandNJanalysesrecovered mutations. This suggests that lineage-specific common thesimulatedtopologyinthevastmajorityofcases,theMP FIG.5.—Parsimonyanalysisofmicrostructuralmutationsin61codingregionssupportsAmborellaandwaterliliesasthemostbasallineagesof angiosperms:(A)bootstrapconsensusof116parsimony-informativeinsertionanddeletioncharacters,(B)phylogramof1ofthe12mostparsimonious trees(141steps)withbranchlengthsdrawnproportionaltothenumberofinferredinsertionanddeletionmutationsoneachbranch. 1956 Leebens-Macketal. FIG.6.—Sixphylogeneticallyinformativeinsertion-deletionmutationsbearingonthepositionofAmborellaandwaterliliesrelativetootheran- giosperms.EachminialignmentisidentifiedbythegenenameandpositionintheAmborellasequenceofthefirstaminoacidintheindelofinterest.Four indels(A,B,C,andD)supportthebasalpositionofAmborellaandwaterlilies;oneindel(E)supportsAmborellaasthesolebasal-mostangiosperm lineage;oneindelsupportsthemonophylyofAmborellapluswaterlilies.Thus,charactersEandFareconsistentwithA–Dbutconflictwithregardtothe relationshipbetweenAmborellaandwaterlilies.Nophylogeneticallyinformativeindelcharacterswereobservedthatwouldhavesupportedabasal positionforgrassesormonocotsintheangiosperms. analyses misidentified Amborella as sister to all other an- lated under the Amborella 1 Nymphaeales phylogeny. giosperms in 21% of the cases where data were simulated The effect of adding Nymphaea to the analysis illustrates undertheAmborella1Nymphaealesphylogeny(table3). the strong influence of taxon sampling on phylogenetic Whenconstrainedtoplacemonocotssistertotherestofthe reconstruction. angiosperms, the ML phylogeny included a very short in- ternal branch leading to the node where an Amborella 1 Molecular Clock Estimates Nymphaealescladedivergedfromtheremainingdicotlin- eages (fig. 1D). Data sets generated under this hypothesis Molecularclockestimatesformostangiospermnodes wereespeciallyproblematicforMPandNJanalyses.Only in the ML topology were in line with recently published MLrecoveredthecorrecttreeinmorethanhalfoftheanal- divergence dates estimated using a variety of procedures yses(table3).Inexploratoryparametricbootstrapanalyses (Chaw et al. 2004; Davies et al. 2004; others reviewed where Nymphaeales was only represented by the Nuphar in Sanderson et al. 2004; table 4; fig. 7). The median sequence, both NJ and MP analyses failed to recover the age estimates for the angiosperm crown group and the correct topologyfor more than 97%of thedata setssimu- MRCA of all monocots, magnoliids, and eudicots were Table3 Parametric BootstrapResults Showthat ParsimonyAnalyses AreMoreSubject toLong-Branch Attraction thanthe Model-Based LikelihoodandNeighbor-JoiningAnalyses When DataSetsareSimulated onPhylogenies B (Amborella 1Nymphaeales basal clade) andC (Nymphaealesbasal-most clade) Likelihood Parsimony NeighborJoining Simulated Phylogenya Amb Amb1Nym Nym Monocot Amb Amb1Nym Nym Monocot Amb Amb1Nym Nym Monocot A 100 — — 100 — — 100 — — B — 100 — 21 79 — — 100 — C — — 100 10 — 90 1 1 98 Db 1 98 48 4 29 41 NOTE.—ParsimonyandneighborjoiningperformedpoorlywhendataweresimulatedontheMLtreeforcingmonocotsassistertoallotherangiosperms.Rateof recoveringthesimulatedtopologyisshowninboldforeachreconstructionmethodandsimulatedphylogeny.Amb,Amborella;Nym,Nymphaeales. a Inputphylogeniesforthesimulationsareshowninfigure1. b RowsdonotsumtoonefortreeDbecauseotherangiospermrootingswereobservedintheinthebootstraptrees,includingmonocots1Amborella1Nymphaeales, Calycanthus1monocots1Amborella1NymphaealesandCalycanthusassistertoremainingangiosperms. ChloroplastGenomePhylogenies 1957 Table 4 Estimated Ages(inmillions of years)for NodesLabeled in Figure6 Node Label Description Constraint EstimatedAge A Euphyllophytes 380minimum 410 401maximum b Seedplants 334 c Pinus1Ginkgo 310minimum 310 d Angiosperms 132minimum 161(158–165) e Nymphaeales/Amborella 136(134–165) f Nymphaeaceaes.s. 22(21–23) g Magnoliids/monocots/eudicots 145(143–147) h Monocots 133(131–135) i Asparagales/commelinids 117(116–118) j Poales 107(106–109) k CorePoaceae 55minimum 55 L Eudicots 125fixed 125 m Coreeudicots 113 n Rosids 108(108–109) o Faboideae 61(61–62) p Asterids/Caryophyllales 105 q Solanaceae 12 NOTE.—Themedianestimatesareshownwiththerangeofobservedestimates for200bootstrapreplicates.Allestimateswerewithin1Myrforthenodeswithno rangeshown. 161MYAand145MYA,respectively.Thebootstrapdis- tribution of divergence time estimates was unimodal with lowstandarderrorestimatesformostnodeswithanotable exception(table4).AgeestimatesfortheMRCAofAmbor- ellaandtheNymphaealeshadtwomodes,correspondingto FIG. 7.—The ML phylogeny inferred from the nucleotide analysis with labeled nodes corresponding to the minimum divergence time bootstrapreplicateMLtopologieswithandwithoutanAm- estimatesshownintable4. borella 1 Nymphaeales clade. Trees with Amborella and thewaterliliesformingacladegaveamedianageestimate genome sequencing of plastid genomes provide copious of135MYAfortheirMRCA,whereasthemedianagees- data for testing hypothesized organismal relationships, timatewas161MYAforthe68bootstrapreplicateswhere comparingmodelsofmolecularevolution,anddeveloping Amborellawasfoundtobesistertoallotherangiosperms. analyticalmethodologies.Atthispoint,however,withfew Therangeofageestimatescalculatedfrom200boot- plastid genomesavailable for analysis, care must betaken strapreplicatesisshownintable4.Itshouldbenotedthat to avoid being mislead by the results of some analyses. whereas the small errors in our branch-lengths estimates Withnearly40,000sitesinourungappednucleotidealign- follow legitimately from the large number of nucleotide ments of 61 genes, any method that is susceptible to positionsincludedinouranalyses,theerrorsonourdiver- statistical inconsistency may be affected by long-branch gencedateestimatesareartificiallylowgiventhatourdat- attraction. inganalysiswasdoneundertheassumptionthatcalibration Ourresultsleadustorejectallbuttwohypothesescon- pointsfortheMRCAsofalleudicotsisknownwithouterror cerningthebasal-mostextantangiospermlineages.TheML (seeGraurandMartin2004).Theappearanceoftricolpate analyses provide weak support for Amborella and Nym- polleninthefossilrecordattheBarremian-Aptianbound- phaeales as a clade sister to all other angiosperms, but ary125MYAprovidesaminimumagefortheMRCAofall the more popular hypothesis placing Amborella alone as extant eudicots. sister to all other angiosperms could not be rejected. The monocots clearly constitute a slightly younger clade in Discussion theangiospermphylogeny.Weestimatethatthedivergence Phylogenetic analyses of plant plastid genomes are ofmonocotandeudicotlineagesoccurredonly16Myrafter providing new insights into the evolution of gene order theMRCAofextantangiosperms(table4).Toputthisre- (Cosner,Raubeson,andJansen2004),lineage-specificsub- sultintocontext,theestimateddatesfortheyoungestnodes stitution rates and patterns (Ane´ et al. 2005), and factors on the ML phylogeny, Nicotiana/Atropa (10.56 6 0.03) influencinggenome-scalephylogeneticinference(Lockhart and Zea/Saccharum (8.87 6 0.01), are just over half this etal.1999;Goremykinetal.2003,2004;Soltisetal.2004b; age difference. Stefanovic´, Rice, and Palmer 2004; Lockhart and Penny Despite the short internodes at the base of the angio- 2005;Martinetal.2005).Alignmentandorthologyassign- sperm phylogeny, branching order can be resolved cor- ments are straightforward for the majority of coding re- rectly with sufficient data. We conclude, however, that gions, making plastid genomes ideal for phylogenetic a lineage-specific increase in nucleotide substitution rates reconstruction and studies of molecular evolution. Whole onthebranch leadingtothegrasses andincompletetaxon
Description: