Genome Phylogenies Indicate a Meaningful a-Proteobacterial Phylogeny and Support a Grouping of the Mitochondria with the Rickettsiales David A. Fitzpatrick, Christopher J. Creevey, and James O. McInerney Department of Biology,National University of Ireland,Maynooth, CountyKildare, Ireland Placementofthemitochondrialbranchonthetreeoflifehasbeenproblematic.Sparsesampling,theuncertaintyofhow lateralgenetransfermightoverwritephylogeneticsignals,andtheuncertaintyofphylogeneticinferencehaveallcontributed to the issue. Here we address this issue using a supertree approach and completed genomic sequences. We first determinethatasensiblea-proteobacterialphylogenetictreeexistsandthatitcanconfidentlybeinferredusingorthologous genes.Weshowthatcongruenceacrosstheseorthologousgenetreesissignificantlybetterthanmightbeexpectedbyran- domchance.Thereissomeevidenceofhorizontalgenetransferwithinthea-proteobacteria,butitappearstoberestrictedto aminorityofgenes(;23%)mostofwhom(;74%)canbecategorizedasoperational.Thismeansthatplacementofthe mitochondrionshouldnotbeexcessivelyhamperedbyinterspeciesgenetransfer.Wethenshowthatthereisaconsistently strongsignalforplacementofthemitochondriononthistreeandthatthisplacementisrelativelyinsensitivetomethod- ological approachordata set.Aconcatenatedalignmentwascreatedconsistingof 15mitochondrion-encoded proteins thatareunlikelytohaveundergoneanylateralgenetransferinthetimelineunderconsideration.Thisalignmentinfers thatthesistergroupofthemitochondria,forthetaxathathavebeensampled,istheorderRickettsiales. Introduction Proteobacteria or the purple bacteria are one of the Anyattempttodeterminewhichextanta-proteobacte- largestknowndivisionswithinprokaryotes.Basedonanal- riumisthesistergroupofthemitochondriadependsonthe ysis of the evolution of 16S rRNA sequences, the proteo- hypothesis that there isa robust and meaningful a-proteo- bacteriadivisionwasfirstcircumscribedbyCarlWoeseand bacterial phylogeny. This in turn implies that within this coworkers (Woese et al. 1985; Woese 1987). Subsequent grouphorizontalgenetransfer(HGT)hasbeensoinfrequent 16S rRNA analysishas led to the proteobacteria being di- that a meaningful species phylogeny can be derived. The videdintofivesubdivisions(a,b,d,c,ande;Woese1987). alternative that HGT is rampant and no phylogeny exists Theproteobacterialgroupisanimportantdivisionasit would mean that attempting to ascertain the sister group includesknownanimal,human,andplantpathogens.Fur- tothemitochondriaispointless.Up until nowsingle-gene thermore,thisdivisionhasplayedacrucialroleineukaryotic phylogenies (especially small-subunit rRNA-based ones) cell origin and evolution through endosymbiosis. The hy- have established many of the accepted relationships be- pothesispertainingtoanendosymbioticoriginofthemito- tween bacteria. Single-gene analyses are dependent on chondria was first proposed in the 19th century (Altman the gene having an evolutionary history that reflects that 1890)andlaterpopularizedbyMargulis(1981).Today,there oftheentireorganism.Abetterapproachwouldbetocom- is strong evidence to support the hypothesis that present- bineinformationfrommanygenes,andtothisend,supertree day mitochondria were once free-living a-proteobacteria analysisisalogicalmethodtoemployasitaimstocombine (Anderssonetal.1998;Lang,Gray,andBurger1999;Ogata informationfrommanyindividualgenes.Anotherapproach etal.2001).However,theexactpositionofthemitochondria wouldbetoconcatenategenes.Concatenationofsequence withinthea-proteobacteriaisstilldebated(Wuetal.2004). data encounters problems when there is some evidence of Anumberofanalyseshasplacedthemitochondriaamong HGT.Theseproblemsarecomparabletothoseencountered the order Rickettsiales (Gupta 1995; Lang, Gray, and when reconstructing phylogenies with genes that have Burger 1999), while others propose that mitochondria are recombined(SchierupandHein2000).Conversely,because more closely related to the Rickettsiaceae family and ofthewayinwhichsupertreessummarizetaxonomiccon- Rickettsia prowazekii in particular to the exclusion of the gruence,theylimittheimpactofindividualgenetransferson Ehrlichia and Anaplasma genera (Karlin and Brocchieri theglobaltopology(Escobar-Paramoetal.2004).Itisthen 2000;Emelyanov2003).OncompletionoftheWolbachia possible using supertree methods to conduct a posteriori pipientiswMelgenomesequence,Wuetal.(2004)reported analyses ofcongruence between input trees andsupertree. strongsupportforagroupingofWolbachiaandRickettsia The primary reason for the prerequisite of a robust as a sister group to the exclusion of the mitochondria. A supertree is to account for possible HGT events. HGT recent analysis that utilized 55 individual and 31 concate- amongbacteriahasbeenshowntobeamajorsourceofge- nated genes encoded in Reclinomonas americana and neticvariation,andreportedincidencesofHGTarecontin- Marchantia polymorpha mitochondrial genomes by Esser ually increasing (Martin et al. 1998; Wolf, Aravind, and et al. (2004) concluded that based on the available data Koonin 1999; de la Cruz and Davies 2000; Brown 2003; ‘‘Rhodospirillum rubrum comes as close to mitochondria Kinsella et al. 2003). By investigating congruence across asanya-proteobacteriuminvestigated’’(Esseretal.2004). ortholog-derivedphylogenetictrees,wecangaugehowof- tenHGThasoccurred.Ifweknewhowmucherrorwemight Key words: a-proteobacterial phylogeny, supertree, mitochondria, expect, then we would know how much confidence we removingfast-evolvingsites. mightinvestinourconclusions.Recently,therehavebeen E-mail:[email protected]. suggestionsthatHGT,ingeneral,hasnotbeensoseverethat Mol.Biol.Evol.23(1):74–85.2006 ithaswipedoutphylogeneticsignalamongcloselyrelated doi:10.1093/molbev/msj009 AdvanceAccesspublicationSeptember8,2005 species (Daubin, Gouy, and Perriere 2001; Daubin, Lerat, (cid:1)TheAuthor2005.PublishedbyOxfordUniversityPressonbehalfof theSocietyforMolecularBiologyandEvolution.Allrightsreserved. Forpermissions,pleasee-mail:[email protected] GenomePhylogeniesIndicateaMeaningfula-ProteobacterialPhylogeny 75 and Perriere 2003), but ortholog trees contain enough GENERATOR chooses the best amino acid substitution conflict that early prokaryotic evolution cannot be repre- modelfromatotalof80possiblemodels.Bootstrapresam- sentedeffectivelywithauniquephylogenetictree(Creevey pling was carried out 100 times on each alignment, and et al.2004). usingthemostappropriateoftheavailablemodels,phylo- In this study, we examine the nature and extent of geneticinferenceswereconstructedusingthesoftwarepro- HGT among the a-proteobacteria, we identify widely dis- gramPHYML(GuindonandGascuel2003).Theresultsof tributed genes that appear to have a history that does not these analyses were summarized using the majority-rule includeHGT,weusethesegenesinordertoinferthesister consensus method. grouprelationshipbetweenthemitochondriaandmembers ofthea-proteobacteria,andwediscusstheimplicationsof Supertree Reconstruction these inferences. Inordertorepresentthevariabilitywithinthedataset, thebesttreesfromeachofthe100bootstrapreplicatesfrom Materials and Methods all 418 single-gene families were combined as the source Sequence Data dataforasupertreeanalysis(atotalof41,800inputtrees). Thebestsupertreewasfoundfollowingaheuristicsearchof The bacterial database used in this analysis consisted treespaceusingthemostsimilarsupertreeanalysis(MSSA) of 13 completely sequenced a-proteobacteria genomes, (Creevey et al. 2004) as implemented in CLANN 2.0.2 i.e., Sinorhizobium meliloti, Agrobacterium tumefaciens, (CreeveyandMcInerney2005).UsingCLANN,100boot- Wolbachia,R.prowazekii,Brucellamelitensis,Caulobacter strap resamplings were also carried out on the 41,800 crescentus, Mesorhizobium loti, Anaplasma marginale, source tree data set to assess the level of support for the Ehrlichia ruminantium, Bradyrhizobium japonicum, internal branches. We tested for the presence of signal in Rhodopseudomonas palustris, Bartonella quintana, and the data set by performing the Yet Another Permutation- Silicibacter pomeroyi and the partial sequence data of Tail-Probabilitytest(YAPTP)randomizationtest(Creevey Novosphingobium aromaticivorans, Rhodobacter sphaer- andMcInerney2005), whichteststhenullhypothesisthat oides, and R. rubrum. Magnetospirillum magnetotacticum congruencebetweentheinputtreesisnobetterthanrandom MS-1 homologs were used in three incidences in place of (Creevey and McInerney 2005). In order to examine the R. rubrum (explained later). Sequence data for the partial behaviorofperfectdata,wegeneratedinputtreesthatwere genomes were supplied by Department of Energy Joint completelycongruentwiththesupertree(idealizeddataset) GenomeInstitute(http://www.jgi.doe.gov/). usingthe‘‘generatetrees’’commandinCLANN.Foradetailed Inadditiontothea-proteobacterialgenomes,weused explanationoftheabovemethods,seeCreeveyetal.2004. the complete mitochondrial genomes of R. americana, Malawimonas jakobiformi, and M. polymorpha as well as the genome sequences of Neisseria meningitidis MC58 and Shimodaira–Hasegawa Tests Escherichia coli 0157:H7 which were used as outgroups. Toassesswhetherdifferencesintopologybetweenthe supertree and individual gene trees are no greater than a-Proteobacterial Phylogeny expectedbychance,weperformedShimodaira–Hasegawa For supertree construction, homologous sequences (SH) tests (Shimodaira and Hasegawa 1999) using Tree- were identified by performing an all-against-all search of Puzzle5.1(Schmidtetal.2002).Genetreesvariedinsize thebacterialdatabaseusingtheBlastPalgorithm(Altschul from4to16taxa;therefore,whentherewerefewerthan16 et al. 1997) with a cutoff expectation (E) value of 10(cid:1)7. sequencesinanyinputdataset,thesupertreewasappropri- Only those homologous families where every member atelyprunedsothatitcontainedthesameleafsetthatwas foundeveryothermember(andnothingelse)andweredis- found in the individual gene trees. The underlying amino tributed across at least four taxa, and when present, were acid alignment from which the input tree was derived alwaysinsinglecopy,wereretained.Thisconservativeap- was used for each SH test. proachhasbeendesignedtominimizetheinadvertentanal- ysisofparalogsorsplicedgenes.Intotal,539familiesmet Mitochondrial Relationships these criteria. The individual protein families were then aligned using ClustalW 1.81 (Thompson, Higgins, and UsingBlastP,withacutoffEvalueof10(cid:1)7,thehomo- Gibson 1994) using the default settings. All alignments logs that were common to R. americana, M. jakobiformi, were corrected for obvious alignment ambiguity using andM.polymorphamitochondriawereusedtofindhomo- the alignment editor Se-Al 2.0a11, and permuation-tail logsamongthebacteria(Altschuletal.1997).Insomein- probability (PTP)tests were carried outto test for an evo- stances, more than one match per genome was found. In lutionary signal that was significantly better than random these cases, the best match to the mitochondrial query noise (P , 0.01). We found that 121 alignments failed wasretained.Thisapproachallowseacha-proteobacterium the PTP test and the remaining 418 alignments were used to be as similar to mitochondria as possible at the level of for phylogenetic analysis. sequence similarity and follows the procedure of Esser Appropriate protein models were selected for each et al. (2004). Appropriate homologs from cox1, cox2, and of the 418 single-gene families within the supertree data cox3 genes could not be located for R. rubrum; therefore, set using the software program MODELGENERATOR following the procedure taken by Esser et al. (2004), M. (http://bioinf.nuim.ie/software/modelgenerator). MODEL- magnetotacticum homologs were used instead, given that 76 Fitzpatricketal. Rhodospirillum and Magnetospirillum are almost always Table1 well-supported sister taxa (Esser et al. 2004). In total, 28 Mitochondrion-Encoded ProteinsWhosePhylogenyIs genefamilieswerefoundtocontainall21taxa(N.menin- not SignificantlyDifferent (P>0.05)tothe Proposed gitidis and E. coli outgroups, 3 mitochondria, and 16 a- a-Proteobacterial Supertree According toSH Test proteobacteria).Individualmitochondrion-encodedprotein Gene LogDet BP ML BP Bayesian Prob Sites families were aligned using ClustalW 1.81 (Thompson, cox3 Rick1og 72 Rick1Bar 80 Rick1Bar 0.95 396 Higgins,andGibson1994)usingthedefaultsettings.Gaps rpl5 Rick 84 Rick1og 52 Rick 1 200 createdintheaminoacidalignmentswereinsertedintothe rps14 Rick 70 Ricka 70 prow 0.93 102 nucleotide sequences to produce codon-based nucleotide rps11 Rick 87 Rick 71 Rick 0.97 132 nad3 Rick 68 Ricka 97 Ricka 0.97 130 alignments using the program putgaps (http://bioinf.nuim. rps2 Rick 62 Rick 60 Rick 1 336 ie/software/putgaps). All alignments were corrected for rps3 Rick 88 prow 80 prow 0.99 253 obvious alignment ambiguity using the alignment editor rps8 Rick1og 97 Rick1og 51 prow 1 133 Se-Al2.0a11. rps12 Rick 98 Rick 96 Rick 1 123 Phylogenetic relationships for the 28 mitochondrion- rpl6 Rick 82 Rick 74 Rick 0.95 189 rps1 Rick 79 Rick ch Rick 1 591 encodedgenes(mitochondrialdataset)werereconstructed rps13 Rick1og 70 Rick 82 Rick1og 0.97 121 using a variety of methods. Inference of relationships was rps19 Rick 73 Rick 96 Rick 0.99 94 carriedoutusingtheBayesianapproachimplementedinthe nad4l (Rick1og) all Rick1Bar 72 Rick 0.99 102 MrBayes v3.0B4 software (Huelsenbeck and Ronquist rpl16 Rick1og ch Rick 97 Rick 0.97 137 2001). Codon-based nucleotide alignments were analyzed NOTE.—Rick,Rickettsiales;prow,Rickettsiaprowazeki;Bar,Bartonellaquin- withtheassumptionthatamong-siteratevariationcouldbe tana;og,Escherichiacoli1Neisseriameningitidis;BP,bootstrapsupportvalues; describedusingadiscreteapproximationtothegammadis- Prob,Bayesiancladeprobabilities.TheoptimumtopologyaccordingtotheSHtest tribution (four categories of sites), and a proportion of the areinboldface.ThesistertaxaofthemitochondrionaccordingtoLogDet,ML,and Bayesianreconstructionmethodsareshownincolumns2,4,and6,respectively. alignment was constrained to be invariant. The shape pa- CorrespondingbootstrapsupportvaluesandBayesiancladeprobabilitiesareshown rameter of the gamma distribution and the hypothesized incolumns3,5,and7. proportion of invariant sites were allowed to vary through a ExceptAnaplasmamarginale. the Markov Chain Monte Carlo (MCMC) chain. In total, four MCMC chains were run for 1.5 million generations; phylogenetic trees were sampled every 100th generation. phylogeny is not significantly different (according to the Plots of the likelihood values revealed that the burn-in SH test) to the proposed a-proteobacteria supertree. The phase took at most 200,000 generations. Therefore, reasoningbehindthisstrategyisthatthesegenesprobably 200,000 trees were discarded for all alignments. Clade have not undergone HGT. This concatenated alignment probabilities for each phylogeny were determined using consisted of 15 genes (table 1) and in total had 3,039 the sumt command of MrBayes v3.0B4. Maximum likeli- alignedaminoacidpositions;removalofgappedpositions hood(ML)phylogeniesbasedonaminoacidsequencesde- reduced the alignment to 1,781amino acids. Phylogenetic rived from the codon-based nucleotide alignments were relationshipsforthetaxawithinthisalignmentwerederived reconstructed using PHYML, with the model of sequence fromLogDetdistancesandtheMLcriterion.MLtreeswere substitution and estimated parameters governing rate vari- reconstructed using PHYML using the appropriate model ation across sites determined by MODELGENERATOR. ofsequencesubstitutionaccordingtothehierarchiallikeli- Additionally,neighbor-joining(NJ)treesbasedonnucleo- hoodratiotestimplementedinMODELGENERATOR.An tide LogDet distances (Lockhart et al. 1994) were recon- NJ tree based on amino acid LogDet distances (Lockhart structed using PAUP* 4.0b10 (Swofford 1998). Constant et al. 1994) was reconstructed using LDDIST (Thollesson and third codon positions were removed from all align- 2004), the fraction of invariant sites was estimated by the ments for the LogDet analyses. Support for relationships method of Sidow, Nguyen, and Speed (1992), and these weredeterminedbycarryingout100bootstrapresamplings were excluded. Support for groups on trees were deter- ofthedata,constructingphylogenetictrees,andsummariz- minedusingthebootstrapresamplingtechnique.Splitsim- ing the results using a majority-rule consensus method. plied by sites in the concatenated alignment were found Phylogenetic trees derived from the equivalent protein usingtheLogDetdistancesdescribedaboveandNeighbor- alignmentswerealsoreconstructedusingMLandBayesian Net(BryantandMoulton2004).Thesewererepresentedas methods, and in all cases nucleotide- and protein-derived aplanargraphusingtheSplitsTreesoftware(Huson1998). treeswerecomparable.Wecouldnotreconstructphyloge- Phylogenetic networks permit the representation of con- netic trees using the LogDet transformation on protein flicting signal or alternative phylogenetic histories (Fitch sequences for the majority of our 28 mitochondrion- 1997).Whentheevolutionaryhistory ofgenes isnottree- encoded genes because there were instances where the like, it is necessary to use phylogenetic networks (Bryant determinants were zero and the distances undefined (see and Moulton 2004). Even when the underlying history is Thollesson 2004). treelike, parallel evolution and sampling error may make it difficult to determine a unique tree. In these cases, net- works provide a valuable tool for representing ambiguity or for visualizing a space of feasible trees (Bryant and Concatenated Mitochondrial Data Set Moulton 2004). Aconcatenatedalignmentwasconstructedconsisting Wealsosubjectedthealignmenttoaspectralanalysis. onlyofthosemitochondrion-encodedgeneswhoseinferred Spectralanalysisisaquantitativemethodthatissuitedfor GenomePhylogeniesIndicateaMeaningfula-ProteobacterialPhylogeny 77 (A) R. americana M. polymorpha Wolbachia R. prowazekii E. coli R. rubrum N. meningitidis N. aromaticivorans C. crescentus M. loti S. meliloti R. sphaeroides B. melitensis A. tumafaciens (B) R. americana R. prowazekii M. polymorpha Wolbachia N. meningitidis R. rubrum N. aromaticivorans A. tumafaciens R. sphaeroides S. meliloti C. crescentus M. lotiB. melitensis E. coli FIG.1.—ComparisonoftwophylogeneticNeighborNets.Theunderlyingalignmentconsistsof31concatenatedmitochondrion-encodedgenesandis identicalinsizeandcontenttothealignmentanalyzedbyEsseretal.(2004).Thealignmenthashadcategoriesoffast-evolvingsitesremoveduntilthe alignmentpassesav2test(P,0.05)foraminoacidhomogeneity.Thetopphylogeneticnet(A)isderivedfromanaminoacidalignmentwithfast- evolvingsitescategorizedusingthemethodofHansmannandMartin(2000).Thesecondphylogeneticnet(B)hashadfast-evolvingsitescategorizedby agammadistributionremoved.FollowingthemethodologyofEsseretal.(2004),distancesweredeterminedusingtheLogDettransformationand invariantsiteshavebeenremoved.Network(A)isidenticaltothatpresentedbyEsseretal.(2004).Innetwork(B)theRickettsialesareinferredto bethesistertaxonofthemitochondrion.Differencesinnetworktopologiesaresolelytheresultofdifferentmethodsusedtostripfast-evolvingsites. testing alternative phylogenetic hypotheses when relation- tionally homogenous alignment of 574 amino acids was ships among taxa are controversial (McCormack et al. found. The second method is implemented in Tree-Puzzle 2000;Winchelletal.2002).Aspectralanalysisonprotein 5.1 and assumes that there are eight categories of sites. LogDet distances creates a complete array of bipartitions Ratevariationacrossthesesitesisassumedtofollowadis- andsplitsinthedatawhichincludeseverypossiblegroup- crete gamma distribution. Again, fast-evolving categories ingamongthetaxaused.Thestrengthofthismethodisthat of sites were iteratively removed until an alignment of it works independently of any one particular tree (Lento 651aminoacidpositionsthatpassedthev2testforhomo- et al. 1995). geneityofaminoacidcompositionwasfoundforthe15-gene AccordingtoTree-Puzzle5.1,theconcatenatedalign- data set. Phylogenetic hypotheses for the homogeneous ment fails the v2 test for homogenous amino acid compo- alignments were generated using the ML framework de- sition(P,0.05).Thehighlyvariablesiteswereiteratively scribed earlier. removedfromtheconcatenatedalignmentusingtwodiffer- To account for problems associated with deep level entmethodsinordertogivethelargestpossiblealignment relationships, the amino acid alignment was recoded into inwhichallsequencespassedthev2testforhomogeneityof thesixDayhoffgroups:C,STPAG,NDEQ,HRK,MILV, aminoacidcomposition.Thereasonforutilizingtwometh- andFWY.Thisisbasedonthenotionthataminoacidsub- odswasbecausewhenweanalyzedthedataofEsseretal. stitutionswithinthesixgroupswillbecommonandnoisy, (2004)wefoundanalternativephylogeneticnetworktothe whilechangesbetweengroupswillberarerandsohaveless one they proposed (fig. 1). The reason for this alternative saturation (Hrdy et al. 2004). The recoded alignment was inference was due to the fact that we had used a different analyzed using the Bayesian criterion implemented in method to remove fast-evolving sites. MrBayes v3.0B4 (Huelsenbeck and Ronquist 2001). The The first method described by Hansmann and Martin modelusedhasa636generaltime-reversibleratematrix. (2000)scoressitesdependingonthenumberofaminoacid Among-site variation in evolutionary rate was modeled character states that are found at that site. This effectively withanallowanceforaproportionofsitestobeinvariable assumes that there is a star phylogeny with equal branch andfourcategoriesofvariablesiteswiththeirratesmodeled lengthsandisthemethodusedbyEsseretal.(2004).Sites byadiscreteapproximationtothegammadistribution.The withhighscoreswereiterativelyremoveduntilacomposi- analysis used an MCMC chain that ran for 2 million 78 Fitzpatricketal. generationssampledevery100thgeneration;thefirst5,000 compatiblewiththesupertree(idealdata),andalsotheinput sampleswerediscardedastheburn-in.Supportforgroups treeswerecompletelyrandomized100differenttimesinor- on the tree were determined using the sumt command of dertocarryouttheYAPTPtest.Thescoresof10,000ran- MrBayes, which produces a majority-rule consensus tree dom supertrees were then obtained based on the real generated from bipartitions found most frequently during (unperturbed), ideal, and randomized input trees, and the the MCMC runs. distributionsofthesesupertreescoreswereassessed.Inad- dition,wesoughttodeterminehowmanyperturbationsper treewereneededtogivetheidealdatathekindsofcharacter- Results isticsthatweobserveintherealdataandhowmanypertur- a-Proteobacteria Supertree bationsmightberequiredtocompletelyrandomizetheinput trees.Inthisway,wedeterminedhowclosetherealdatawas Totestwhetherthereisameaningfula-proteobacterial to ideal data andhow close it was to random data. phylogeny,adatasetconsistingof16a-proteobacterialge- The range of supertree scores for the real data varied nomesor60,640individualcodingsequenceswasutilized. from33,196(best)to92,027(worst)(fig.2).Ingeneral,the Weidentified418single-genefamiliesthatmetourcriteria scoresoftheworstsupertreesfromallthreedatasetswere ofbeingpresentinsinglecopyinatleastfourgenomesand similar.Thedistributionofthescoresoftheoptimalsuper- that have phylogenetic signal according to the PTP test. trees from the YAPTP test is centered on 70,400 (6300). Aminoacidsequenceswerealignedresultinginacombined This indicates that congruence across the input trees is length of 140,314 aligned positions. Bootstrapresampling greater than expected by random chance (Creevey et al. followedbyphylogeneticinferenceresultedin100MLtrees 2004). The ideal data generated in this study represents foreachfamily.Usingthese41,800treesasinputdatafor a scenario where there is complete compatibility between supertreeanalysis,heuristicsearchesofsupertreespacewere the input trees and supertree in figure 2. The distribution carriedoutandtheoptimalsupertreewasidentified(fig.2). of tree scores for this ideal data set was heavily skewed One hundred bootstrap resamplings of the input trees was to the left, with a skewness value of (cid:1)0.8. The real data carriedout,andthesupportvaluesoftheinternalbranches wereslightlymoreskewedtotheleftwithaskewnessvalue wereplacedonthesupertreeinfigure2. of (cid:1)0.11. The randomized data had a skewness value of (cid:1)0.02,whichisclosetozero,asexpected.Allthreedistri- How Compatible Are the Input Trees? butions had a standard deviation (SD) of 0.02. Anidealizeddatasetwascreatedwherethetopologyof Wethentooktheidealdatasetandapplyingthesub- theinputtreeswaschangedtomakethesetreescompletely tree pruning regrafting (SPR) method on the input trees FIG.2.—Theshapeofsupertreespace.Indiamondsarethescoresfor10,000randomsupertreeswhenscoredagainstanidealsourcetreedataset.The besttreefromtheidealdistributionhasascoreof0andtheskewnessoftheidealdistributionis(cid:1)0.08.Thetrianglesarethescoresof10,000random supertreeswhencomparedtotherealsourcetreedataset.Theskewnessofthisdistributionis(cid:1)0.11.Heuristicsearchesofsupertreespaceusingthereal sourcedatarevealedthatthebestsupertreehadascoreof33,196,andwhenbootstrapped,nearlyeveryinternalbranchofthetreehad100%bootstrap supportvalue.Thesquaresarethescoresof10,000randomsupertreeswhencomparedtoarandomizedsetofsourcetrees,thisdistributionhadaskewness of(cid:1)0.02.TheSDoftheskewnessfortheideal,real,andrandomizeddatawas0.02.Thecirclesdemonstratetheresultsof100YAPTPtests. GenomePhylogeniesIndicateaMeaningfula-ProteobacterialPhylogeny 79 (equivalenttointroducinganHGTevent),weincrementally these 15 mitochondrion-encoded genes were constructed perturbedtheideal data set. TheseSPR events wereintro- for the codon-based nucleotide alignments using LogDet, duced,andateachstagethebestsupertreewasfoundandits ML, and Bayesian methods. All three methods revealed scoreforthedatasetassesseduntilthescore,accordingto a general consensus regarding the possible sister group theMSSAmethod, was thesame as either thereal orran- of mitochondria (table 1). The majority of these 15 gene domizeddata.Thisgaveanindicationofthenumberofper- trees placed the members of the Rickettsiales as the sister turbations required to make ideal data as noisy as the real groupofthemitochondria.WecarriedoutavarietyofSH (orrandom)dataset.Eachofthesesimulationswascarried tests using the Whelan and Goldman (2000) model of out100times.Fromthe100simulations,themeannumber amino acid substitution. For each gene, the underlying ofSPReventspertreerequiredtotransformidealdatainto alignment and thethree proposed topologies derived from the equivalent of the real data set was 1.6 (with an SD of the above three methods were used for each gene family. 0.07).Totransform theidealdataintodatawithanequiv- An additional ‘‘constrained’’ tree where R. rubrum was alent level of incongruence as random data required 5.0 placedasasistergrouptothemitochondria(recreatedusing HGT events per tree (with an SD of 0.28). MrBayes)wasalsoincludedintheanalysis.Thereasoning Taken together, it can be seen that the real data are behindthisapproachisthatR.rubrumhasbeensuggested significantly different than random data (according to the as a possible sister group to the mitochondria (Esser et al. YAPTPtest)andbytwomeasuresitismoresimilartoideal 2004). According to the SH test, the constrained tree was data than it is to random data. significantly (P . 0.05) worse than the three alternative Ofthe418inputtrees,SHtestsindicatedthat98(23%) competing topologies in all cases (results not shown) and whosetopologycouldnotbeinducedbytheappropriately wasthereforenotconsideredforfurtheranalysis.Examin- prunedsupertree,describedtheirunderlyingalignmentssig- ingtheoptimumtopologiesaccordingtotheSHtestreveals nificantlybetter(P,0.05)thandidthesupertree.Thedif- that in total 11 trees placed the Rickettsiales as the sister ferences in topology could not be explained by chance grouptothemitochondria.OnetreeplacedtheRickettsiales alone.Fortheremaining320trees,therewasnosignificant (except for A. marginale) as the sister group to the mito- difference between the proposed supertree and underlying chondrion,anotherinferredthattheRickettsialescombined tree (24% ofthe these are identical to theproposed super- withB.quintanawerethesistergroup,anothertreeplaced tree).HGTeventsandhiddenparalogiesareamongthepos- the Rickettsiales and the outgroup together, and then an- sible reasons for the 98 families having an evolutionary othersplitseparatedthisgroupandthemitochondriafrom history that issignificantly different to that implied by the therestofthetaxa. The finaltree placedthemitochondria supertree.Inanefforttoquantifythetypesofgenesthatdis- with R. prowazekii as sister taxa (table 1). play significantly different topologies to the supertree, we categorized genes as being either operational or informa- Concatenated Alignment tionalinfunctionbasedonthecriteriausedbyJain,Rivera, and Lake (1999). According to the complexity hypothesis Concatenation of the 15 mitochondrion-encoded (Jain,Rivera,andLake1999),wewouldexpecttoseeop- genes of interest gave an alignment of 3,039 amino acids erationalgenesbeingexchangedatahigherratethaninfor- in length or 1,781 amino acids when positions containing mationalgenes.Informational genes are members of large gapswereremoved.Concatenationofsequencedatahasthe complex systems thereby making horizontal transfer of benefit of reducing stochastic effects related to short se- these gene products less probable (Jain, Rivera, and Lake quencelength.Accordingtotheaminoacidcompositional 1999).Wefound87informationalgenesand96operational heterogeneitytest,therewassevereaminoacidbiaswithin genes,theremaininggeneshaveyettobeassignedafunction the data set; therefore, the assumption of stationary amino (i.e.,conservedhypothetical)orbelongtoacategorythatis acidfrequencies(GuandLi1996)wasviolated.Toaccount neitheroperationalnorinformational.Intotal,18ofthe87 forthesebiases,theLogDettransformation(Lockhartetal. informationalgenes(;20%)werefoundtohaveatopology 1994)wasusedtoinfertherelationshipsamongthea-pro- thatissignificantlydifferenttotheproposedsupertree,while teobacteriaandthemitochondriaforthedatasetcontaining 50 of the 96 operational gene (;51%) trees differ signifi- no gaps. NJ trees constructed using LogDet distances de- cantlytothesupertreeaccordingtotheSHtest.Assuming rivedfromtheaminoacidalignmentplacetheRickettsiales that systematic or stochastic errors in phylogenetic recon- asthesistergrouptothemitochondriawith99%bootstrap structionoftheinputtreesisnotfunctiondependentandalso support (fig. 3A). Phylogenetic hypotheses were also in- thatthesupertreeistrulyrepresentativeoftherelationships ferred using the program PHYML using the appropriate amongthea-proteobacteria,thiswouldindicatethattheop- modelofsequencesubstitutionselectedbyMODELGEN- erationalgeneswithinthisdatasetexperiencesignificantly ERATOR. The ML topology was identical to the LogDet (P , 0.001) more horizontal transfers or other anomalous topology and branch lengths and bootstrap support values events than informational genes. were also comparable (fig. 5A). The NeighborNet of Log- Det protein distances for this alignment also places the Rickettsiales as the closest a-proteobacterial group to the Mitochondrial Origin mitochondria (fig. 3B). Comparing the supertree in figure Individual Mitochondrion-Encoded Genes 2 to the tree built from the concatenated sequence data The overlap between the 28 mitochondrion-encoded (fig. 3A), we see that the main orders such as the Rickett- genesandthe301genesthatarecompatiblewiththesuper- siales, the Rhizobiales, and the Rhodobacterales are tree was 15 genes. Phylogenetic hypotheses based upon strongly supported clades in both cases. In our supertree, 80 Fitzpatricketal. (A) (B) E. coli b N. meningitidis N. aromaticivorans S. pomeroyi R. rubrum R. sphaeroides C. crescentus E. coli N. aromaticivorans B. japonicum N. meningitidis 98 M. loti RS.. pmaelulisltortiis 100 S. meliloti M. loti B. melitensis i A. tumafaciens 100 100 B. melitensis B. quintana A. marginale j 91 B. quintana R. rubrum E. ruminantium e R. palustris Wolbachia 100 99 100 B. japonicum C. crescentus R. prowazekii 99 100 R. sphaeroides f S. pomeroyi R. americana 100 0.1 a M. jakobiformi 91 M. polymorpha R. americana h 99 R. prowazekii 0.1 g M. jakobiformi Wolbachia M. polymorpha 100 c 100 d E. ruminantium 100 A. marginale (C) a b c d e f g h i j k FIG.3.—TheMLphylogenyfortheconcatenatedalignmentisshownin(A).Bootstrapsupportsareshownforselectednodes.Theinferredtopology isverysimilartoaninferencebasedonLogDetdistances(seefig.5A).TheRickettsialesgroupbesidethemitochondrionwith99%bootstrapsupport.A NeighborNetoftheconcatenatedalignmentisalsodisplayedin(B);thisphylogeneticnetisbasedonproteinLogDetdistances.TheRickettsialesappearto shareasplitwiththemitochondrion.Alentoplot(C)thatutilizestheaforementionedLogDetdistancesfortheconcatenatedalignmentisalsodisplayed. Thetop40supportedsplits(excludingindividualtaxa)aredisplayed.Barsabovethexaxisrepresentfrequencyofsupportforeachsplit.Barsbelowthex axisrepresentthesumofallconflictsagainstthecorrespondingsplitabovethexaxis.Lettersabovecolumnsrepresentparticularsplitsinthedata,and thesehavebeenmappedontothephylogenetictreefordisplaypurposes.AttentionisdrawntosplithwhichincludesthemitochondrionandtheRick- ettsialesandsplitkwhichincludesRhodospirillumrubrumandthemitochondrion.Theconflictforsplitkfaroutweighsanysupport. C.crescentusandR.rubrumaregroupedassistertaxawith enatedalignmentplacestheseorphantaxaonthetreewith poor support (50% bootstrap support). Novosphingobium very high levels of support. For example, C. crescentus is aromaticivorans is grouped beside the Rhodobacterales, grouped beside the Rhodobacterales with 99% bootstrap alsowith50%bootstrapsupport.Thesefindingsarenoten- support. tirely surprising as these bacteria are lone members of the AspectralanalysisofLogDetdistanceswasperformed orders Caulobacterales, Rhodospirillales, and Sphingomo- using Spectrum (http://taxonomy.zoology.gla.ac.uk/;mac/ nadales,respectively,inourtree.Contrastingly,theconcat- spectrum) to further investigate the phylogenetic signal GenomePhylogeniesIndicateaMeaningfula-ProteobacterialPhylogeny 81 E. coli ing is real. The first test simply involved the systematic N. meningitidis analyses of several data sets that only had a single long- branch taxon included. This test was to see if the long- R. rubrum branchtaxawerestillattachingtothetreeinthesameplace, N. aromaticivorans evenwhenotherlong-branchtaxawereabsent.Inallthree 1.00 A. tumafaciens cases, the long-branched taxa attached to the R. rubrum 1.00 branch with high bootstrap support (fig. 5). S. meliloti 1.00 Thesecondtestoflong-branchattractioninvolvedthe 1.00 M. loti removalofthelong-branchtaxafromthetree(seefig.6)and 0.99 B. quintana their pseudorandom reinsertion 100 times. The taxa were 1.00 1.00 B. melitensis inserted into the tree as long as the point of insertion was not beside another long-branch taxon. This resulted in the B. japonicum 1.00 construction of phylogenetic trees with interspersed long 1.00 1.00 R. palustris and short branches, but where no long-branch taxa were C. crescentus neighbors.Oncethetaxon-insertionprocesswascompleted, 1.00 S. pomeroyi theconcatenatedalignmentwasusedtodetermineappropri- 1.00 atebranchlengthsforalltaxa.Naturally,thebrancheswere R. sphaeroides quite long for the inserted taxa. We then simulated the R. americana evolutionofaset ofsequencesforthisartificialtreeusing 1.00 M. jakobiformi Seq-Gen(RambautandGrassly1997),andusingthesame 1.00 methodsofanalysesthatwehaveusedpreviouslyonthecon- M. polymorpha 0.99 catenateddataset,wereanalyzedthesimulatedsequencesto R. prowazekii seehowoftenthelong-branchtaxawereincorrectlyplaced 1.00 Wolbachia beside each other (i.e., what was the strength of the long- 1.00 A. marginale branchattractioninadatasetofthisnature).Wefoundthat 1.00 0.1 eventhoughtherecouldbeslightdiscrepanciesbetweenthe E. ruminantium treeusedtosimulatethedataandthetreederivedfromthe FIG.4.—InferredphylogenyshowsaBayesianconsensustreeofthe simulateddata,therewasneverasinglecasewherethelong- 15-geneconcatenatedalignment.Theanalysisusesageneraltime-revers- branchtaxawereclusteredtogether.Althoughthistestcan- iblesubstitutionmatrixestimatedfromaminoacidsequencesrecodedinto notbetakenasdefinitiveproofthatlong-branchattraction thesixDayhoffgroups.Among-siterateheterogeneitywasmodeledusing afour-categorygammacorrectionwithafractionofinvariantsites.The aloneisnotresponsiblefortheirclusteringintherealdata analysis used the MCMC strategy from MrBayes; parameters such as set,weatleastshowthatfordatasetswithsimilarcharacter- thecompositionandsubstitutionratematrixwerefree.Posteriorprobabil- istics(composition,degreeofoverallevolutionarychange, itiesforselectedbranchesareshownatnodes. sequence length, mixture of long, and short branches)this effectisnotsevereenoughtoresultincompletelyartifactual present in our concatenated alignment. Lento et al. (1995) topologies. have shown that a spectral analysis of LogDet distances We also recoded the amino acid data into the six seemstoreducetheamountofrandomnoisecomparedwith groups of chemically related amino acids and analyzed the anormalcharacter-basedspectralanalysis.AsSpectrumcan data using theBayesian criteria.Thisapproach hastheef- onlyaccommodate20taxa,weremovedA.tumefaciensfor fect of shortening long branches and homogenizing the this part of the analysis. With 20 taxa, thetotal number of amino acid composition among sequences (Hrdy et al. possiblesplitsis220(cid:1)1or542,288.Ofthese,101splitsare 2004).Fromthisanalysis,wefoundthatthemitochondrion supported by some evidence and the 40 most highly sup- and Rickettsiales are again strongly supported (fig. 4). ported are represented by the bars shown in the top half Removing the fast-evolving sites from the 15-gene of figure 3C. Attention is drawn to the split marked h, concatenated alignment using the method of Hansmann the taxa within this split include the Rickettsiales and the and Martin (2000) until the reduced alignment passes the mitochondria. The split marked k represents the grouping amino acid compositional heterogeneity test resulted in of the mitochondria and R. rubrum as sister taxa. This re- analignmentof574aminoacidsinlength.TheNeighborNet lationship is not observed in either our phylogenetic tree of LogDet protein distances for this shortened alignment (fig. 3A) or phylogenetic network (fig. 3B). Furthermore, places the Rickettsiales as the closest a-proteobacterium the degree of conflict for this split far outweighs the level group to the mitochondria. Bootstrap resampling was ofsupport.Forcompleteness,wealsoperformedtheabove employed to assess levels of support for groups. This analyses on the alignment with gap positions still present. showedthattheRickettsialesandthemitochondriaformed The results of these analyses were in agreement with our acladewithrelativelylowbootstrapsupport(65%,results observations in the gapless alignment. notshown).Usingthegammadistributiontomodelsite-rate Lookingatourphylogenetictreeandnetwork(fig.3A variation in evolutionary rate and removing fast-evolving and3B),itisobviousthatthetaxawithlonginternalbranches sites until the data passed the v2 test of compositional ho- cluster together—the mitochondria, the Rickettsiales, and mogeneity resulted in an alignment of 651 amino acids in the outgroups. We carried out two tests to investigate length. The NeighborNet of LogDet protein distances for whetherthisgroupingisasystematicerror(long-branchat- this shortened alignment places the Rickettsiales beside traction,alsoknownastheFelsensteinzone)orthegroup- themitochondriaastheyshareasplit;abootstrapanalysis 82 Fitzpatricketal. (A) N. meningitidis 100 E. coli R. rubrum N. aromaticivorans C. crescentus 87 100 B. japonicum R. palustris 95 87 B. melitensis B. quintana 95 M. loti 70 A. tumafaciens 100 S. meliloti R. sphaeroides 100 S. pomeroyi M. polymorpha 100 R. americana 60 M. jakobiformi 81 R. prowazekii 88 Wolbachia 100 E. ruminantium 0.1 100 A. marginale (B) 100 B. japonicum (C) 100 B. japonicum (D) N. aromaticivorans R. palustris R. palustris C. crescentus 91 5494 BB.. qmueilnitteannsais 96 100 AS.. tmuemlialofaticiens 98 60 BM.. qluotiintana 76 60 100 100 ASMC... . mt culroemetlisialcofeatnictiuesns 70 70 100 4090 BBM.. .qm luoetilinitteannsais 76 91100 110000 SBAB.... mjtmauepemlloiilantoefitancicusiimesns C. crescentus R. palustris 98 100 SR.. psopmhaeeroroyiid es 88 100 R. sphaeroides 100 S. pomeroyi S. pomeroyi R. sphaeroides N. aromaticivorans R. rubrum 100 R. americana N. aromaticivorans R. prowazekii 100100 MM.. jpaokloybmifoorrpmhia 88100 NE.. cmoeliningitidis 98100100 WAo. lmbaacrhgiianale R. rubrum R. rubrum 100 E. ruminantium FIG.5.—Mainphylogram(A)showstheinferredtopologyoftheproteinconcatenatedalignment;distanceswerederivedfromLogDetTransfor- mation.Twoofthethreelongbranchesareremovedfromthealignmentandthetreeisredrawn.Thegroupingoftheremaininglongbranchtothefree- livinga-proteobacteriaisnoted;treesarerootedaroundtheremaininglongbranchforconvenience.(B)ThemitochondriongroupsbesideRhodospirillum rubrum.(C)TheoutgroupssitbesideR.rubrum.(D)TheRickettsialesgroupbesideR.rubrum. again revealed weak support for this finding (52%, results optimal supertree to the size of each individual gene tree, not shown). It should be pointed out that the topologies 77%ofthetopologiesareeitheridenticaltothesupertreeor, forthefullandshortenedalignmentareidentical;bootstrap wherethetopologiesaredifferent,thisdifferenceisnotsig- support values for the shorter alignments are lower, how- nificant(i.e.,accordingtotheSHtest,thesedifferencesare ever. As we systematically removed categories of fast- nogreaterthanwemightexpectfromsamplingeffects,P, evolving sites from the alignment, the bootstrap support 0.05).Thisiscompatiblewiththehypothesisthatforthese values decreased across the tree but the overall branching data setsthetopologyinducedbythesupertreeinfigure1 structure never changed. can adequately describe their evolution. Conversely, this finding is incompatible with a hypothesis that HGT has Discussion completely randomized genome evolution. The remaining To investigate the sister group relationship between 23%oftreesthatshowasignificantdifferencecouldbedo- thea-proteobacteriaandthemitochondria,itisimperative ingsoasaresultofsystematicbiasesinreconstructingphy- thatthereisameaningfula-proteobacterialspeciesphylog- logenetic relationships for individual gene trees, HGT, or eny. We use a supertree-based approach for simultaneous hiddenparalogy.Itisunlikely,though,thatHGTaccounts determination of species-level phylogeny and analysis of for all 23% of the genes that are incompatible, given the HGTfor16a-proteobacteriaderivedfrom418genetrees. difficulty of inferring orthology with absolute certainty. Thebestsupertreereceivesabetterscore(33,196)thanany We have shown that a significantly (v2 test, P , 0.001) tree from the randomization procedure (YAPTP) which higher percentage of operational genes disagree signifi- meanstheindividualgenetreeshaveasignificantlyhigher cantly with the supertree, a finding in agreement with levelofagreementwitheachotherthanwouldbeexpected the complexity hypothesis. A bootstrap analysis of the in- by chance. The score distributions from real and idealized put orthologous trees also lends support to a robust a- gene trees (as judged by skewness values) are remarkably proteobacteria phylogeny as the majority of splits are similar.Itrequiresanaverageof1.6SPRbranchswapsper stronglysupported.Takentogether,wesuggestthatthea- tree to give ideal data a similar score to the real data, proteobacteria have a meaningful phylogeny. whereasitrequires5.0SPRswapspertreetoconvertideal Recently,R.rubrumhasbeensuggestedtobethesis- data into random data. We find that when we prune the ter group of the mitochondria based on a concatenated GenomePhylogeniesIndicateaMeaningfula-ProteobacterialPhylogeny 83 fast-evolving sites were present in the alignment. As the fast-evolving sites were removed and the alignments be- came shorter with a higher proportion of invariant and slow-evolving sites, we lost phylogenetic signal. Further evidence that the mitochondria share a com- mon ancestor with the Rickettsiales to the exclusion of the other taxa comes from individual mitochondrial pro- teins. All 15 individual trees indicate a sister group rela- tionship between the mitochondria and at least one of the bacteria from this order. It has been suggested that phylogenetic inferences based on highly conserved mito- chondrial proteins such as cytochrome oxidase subunits cox3 have been suggested as better indicators of relation- ships than fast-evolving mitochondrial genes that may infer incorrect relationships as an artifact of long-branch attraction (Lang, Gray, and Burger 1999). Taking the in- ferred evolutionary history of cox3 alone, we can see FIG.6.—Thisphylogramdepictsareducedconstrainedtree.Thistree that this gene would suggest that the mitochondria and onlycontainstheshort-branchedtaxa.Long-branchedtaxaarerandomly Rickettsialesareeachothers’closestrelatives(table1)with attachedtotheconstrainedtree.Relativebranchlengthsforthenewen- bootstrap support for this relationship being relatively largedconstrainedtreeweredeterminedinPHYMLusingtheoriginalcon- catenatedalignment.Thisprocedurewasrepeated100timestogive100 strong (;80%). randomconstrainedtrees. Abetterunderstandingoftheorigin andevolution of themitochondrionwillonlybepossiblewhenamorerep- alignmentof31mitochondrion-encodedgenes(Esseretal. resentative sample of the a-proteobacteria is fully se- 2004). Our analysis utilized 15 mitochondrial genes that quenced. It is entirely possible that there are many more haveaphylogenythatisnotsignificantlydifferent(SHtest, mitochondrion-like bacterial species yet to be identified. P,0.05)totheproposed supertree. Using thisapproach, Currently, only 0.4% of all existing bacterial species have wehaveminimizedtheobfuscatinginfluenceofincluding beenidentifiedandformallydescribedletalonesequenced. genes whose history might not be compatible with one Anotherconsiderationinsuchananalysismustbethedraw- another. Our data point to the conventional sister group backs associated with the use of eukaryote mitochondrial relationship of Rickettsiales and the mitochondrion. For DNAandthehighrateofnucleotidesubstitutionassociated completeness,weperformedouranalysesonalargeralign- withthesegenes.Theseproblemscanbeexacerbatedwhen ment containing all 28 mitochondrion-encoded proteins, we use mitochondrion sequences in studies with bacterial and the conclusions of these analyses agree with the find- homologs that have divergence times of approximately ings of our initial analysis (results not shown). When we 1.5–2billionyears(Sicheritz-Ponten,Kurland,andAndersson reexamined the 31-gene sequence alignment of Esser 1998). The end result of such an analysis can be severely et al. (2004), we found alternative topologies depending affected by long-branch attraction. Throughout this analy- onwhatmethodwasutilizedtoremovefast-evolvingsites. sis,weweremindfuloftheseproblemsandendeavoredto Whenfast-evolvingsiteswerestrippedinthesamemanner take steps such as the use of LogDet distance matrices to as their analysis, we recreated identical results to those helppreventmisleadingresults,andweinvestigatedthepo- authors (fig. 1A). Alternatively, when fast-evolving sites sition of long-branched taxa in the absence of other long categorized by a gamma distribution are removed, the branches. Another approach that is analogous to a trans- RickettsialesandnotR.rubrumareinferredtobethesister version analysis of nucleotide sequences is to recode the grouptothemitochondrion(fig.1B).Theseresultsraisecon- inputalignmentintothesixDayhoffcategoriesandsubse- cerns regarding the method used to strip out fast-evolving quentlyinfertheirphylogeny.Thisapproachhastheadvan- sites. The method described by Hansmann and Martin tage that we could use a Bayesian approach and optimize (2000) strips sites out based on the number of character ageneraltime-reversiblesubstitutionmatrix,ratherthanas- states found at a particular site, while the discrete gamma suming an ad hoc amino acid substitution matrix. We in- methodusesMLestimationtoplacesitesintodifferentsite vestigated such an approach on the 15-gene alignment categories.Clearly,thesemethodscategorizesitesverydif- (gapped positions removed) and found that the grouping ferently,andthemeritsofeachwillneedfurtherstudy.Inour of the mitochondrion and Rickettsiales are again strongly analysis, wehavemoreextensive taxon sampling, and also supported. we erred on the side of caution and removed fast-evolving Esseretal.(2004)makethepointthattheoverallfer- sites using both methods described above until we found mentativephysiologyofR.rubrumisquitesimilartoeukar- alignments that passed a v2 test of amino acid composi- yotes that lack mitochondria or contain anaerobic tionalheterogeneity.Phylogeneticinferencesofbothresul- mitochondria. Similarly, it can also be argued that there tant alignments yielded similar conclusions in that the is a strong affinity between mitochondria and Rickettsia Ricketsialesareinferredtobethesistertaxonofthemito- for the genes coding for components of the Krebs cycle chondrionwithvaryingdegreesofsupport(65%and52% (Anderssonetal.1998),therespiratorychain(Gray,Burger, bootstrap support, respectively). Higher levels of support and Lang 1999), and the translation system (Sicheritz- for these inferences were observed when categories of Ponten, Kurland, and Andersson 1998).
Description: