Downloaded from genome.cshlp.org on January 7, 2023 - Published by Cold Spring Harbor Laboratory Press Research Comparative Genomics of the Archaea (Euryarchaeota): Evolution of Conserved Protein Families, the Stable Core, and the Variable Shell Kira S. Makarova,1,2,4 L. Aravind,1,3 Michael Y. Galperin,1 Nick V. Grishin,1 Roman L. Tatusov,1 Yuri I. Wolf,1,4 and Eugene V. Koonin1,5 1NationalCenterforBiotechnologyInformation,NationalLibraryofMedicine,NationalInstitutesofHealth, Bethesda,Maryland20894USA;2DepartmentofPathology,F.E.HebertSchoolofMedicine,UniformedServicesUniversityof theHealthSciences,Bethesda,Maryland20814-4799USA;3DepartmentofBiology,TexasA&MUniversity,CollegeStation, Texas70843USA Comparative analysis of the protein sequences encoded in the four euryarchaeal species whose genomes have been sequenced completely (Methanococcus jannaschii, Methanobacterium thermoautotrophicum, Archaeoglobus fulgidus, and Pyrococcus horikoshii) revealed 1326 orthologous sets, of which 543 are represented in all four species. The proteins that belong to these conserved euryarchaeal families comprise 31%–35% of the gene complement and may be considered the evolutionarily stable core of the archaeal genomes. The core gene set includes the great majority of genes coding for proteins involved in genome replication and expression, but only a relatively small subset of metabolic functions. For many gene families that are conserved in all euryarchaea, previously undetected orthologs in bacteria and eukaryotes were identified. A number of euryarchaeal synapomorphies (unique shared characters) were identified; these are protein families that possess sequence signatures or domain architectures that are conserved in all euryarchaea but are not found in bacteria or eukaryotes. In addition, euryarchaea-specific expansions of several protein and domain families were detected. In terms of their apparent phylogenetic affinities, the archaeal protein families split into bacterial and eukaryotic families. The majority of the proteins that have only eukaryotic orthologs or show the greatest similarity to their eukaryotic counterparts belong to the core set. The families of euryarchaeal genes that are conserved in only two or three species constitute a relatively mobile component of the genomes whose evolution should have involved multiple events of lineage-specific gene loss and horizontal gene transfer. Frequently these proteins have detectable orthologs only in bacteria or show the greatest similarity to the bacterial homologs, which might suggest a significant role ofhorizontalgenetransferfrombacteriaintheevolutionoftheeuryarchaeota. Phylogenetic analysis of rRNA and a set of proteins 1996; Brown and Doolittle 1997; Edgell and Doolittle involved in translation, transcription, and replication 1997). However, it has been aptly noted that archaea hasledtotheconceptofarchaeaasathirddivisionof have a “eubacterial form and eukaryotic content” life,distinctfromeitherbacteriaoreukaryotes(Woese (Keeling et al. 1994). Indeed, beyond the common et al. 1978, 1990; Woese and Gupta 1981; Pace et al. “negative”trait,namelythesmallcellsizeandtheab- 1986;Zillig1991).Furthermore,rootingofparalogous sence of a nucleus, archaea and bacteria share major trees for translation elongation factors and proton aspects of genome organization and expression strat- ATPases suggested that archaea are a sister group of egy. The most important of these common features eukaryotes(Gogartenetal.1989a,b;Iwabeetal.1989; includethe(typically)singlecircularchromosome,the GribaldoandCammarano1998).Thisconceptappears absence of introns in protein-coding genes, the oper- to be gaining further support from the generally eu- onicorganizationofmanygenes,andtheabsenceofa karyoticlayoutofthegenomeexpressionsystems,par- 58-terminal cap and the presence of a ribosomal- ticularly the system of DNA replication whose princi- binding (Shine-Dalgarno) site in archaeal mRNAs palcomponentsareorthologoustotherespectiverep- (Brown and Doolittle 1997). Furthermore, several op- lication proteins of eukaryotes but apparently do not erons,particularlythoseencodingribosomalproteins, havecounterpartsinbacteria(MushegianandKoonin are conserved in archaea and bacteria (Brown and Doolittle1997;KooninandGalperin1997). 4Present address: Institute of Cytology and Genetics, Russian Academy of Theanalysisofthefirsttwocompletelysequenced Sciences,Novosibirsk630090,Russia. archaeal genomes, those of Methanococcus jannaschii 5Correspondingauthor. [email protected];FAX(301)480-9241. (Bult et al. 1996) and Methanobacterium thermoautotro- 608 GenomeResearch 9:608–628©1999byColdSpringHarborLaboratoryPressISSN1054-9803/99$5.00;www.genome.org www.genome.org Downloaded from genome.cshlp.org on January 7, 2023 - Published by Cold Spring Harbor Laboratory Press Archaeal Genomics and Evolution phicum (Smith et al. 1997), showed, somewhat unex- thologs that precludes their detection altogether. As a pectedly given the already established archaeal– result,thefinalstepintheconstructionoftheoriginal eukaryotic clade, that the bacterial form of archaea is collectionofCOGsinvolvedconsiderablemanualcor- complemented by considerable bacterial content. It rection.Thedistancesseparatingthefourarchaealspe- hasbecomeclearthatthemajorityofarchaealproteins cies are intermediate between those that are seen show the greatest similarity to their bacterial ho- among close bacterial species such as Escherichia coli mologs,whichislikelytoindicatebacterialorigin,and andHaemophilusinfluenzae(intheoriginalCOGanaly- onlyaminoritylook“eukaryotic”(Kooninetal.1997; sis, these species were not considered independently) Smithetal.1997).Infunctionalterms,thereisaclear and those between phylogenetically remote species splitbetweenthebacterialandeukaryoticcomponents suchasbacteriaandeukaryotes.Inquantitativeterms, ofthearchaealgenomes—theeukaryoticgenesarepri- the mean percent identity of the best hits in all- marilythosecodingforcomponentsofthetranslation, against-all interspecies comparisons of protein se- transcription, and replication machineries, whereas quences is in the range of 41%–46% for the archaea, thebacterialonestypicallyencodemetabolicenzymes 57%forE.coliversusH.influenzae,andbetween30%– andproteinsinvolvedincelldivisionandcellwallbio- 35% for most distant bacterial lineages and bacteria genesis (Koonin et al. 1997; Smith et al. 1997). These versus eukaryotes or archaea (N.V. Grishin, unpubl.; findingsraisedtheissueofpossibleextensivegeneex- ftp://ncbi.nlm.nih.gov/pub/koonin/gen2gen). It ap- changebetweenbacteriaandarchaea(Fengetal.1997; pears that the intermediate level of sequence conser- Kooninetal.1997;DoolittleandLogsdon1998). vationseenamongthearchaeaishighenoughtopre- Subsequently, the complete genome sequences of ventmost,ifnotall,artificiallumpingofCOGsattrib- two additional archaeal species, namely Archaeoglobus utable to paralogous families, but low enough for the fulgidus (Klenk et al. 1997) and Pyrococcus horikoshii consistency criterion to be valid and useful. For these (Kawarabayasi et al. 1998a,b), have been reported. All reasons, most of the archaeal COGs delineated by the four available complete archaeal genomes represent automaticprocedurewerecorroboratedbysubsequent only one of the two (or possibly three) main archaeal case-by-case evaluation. Furthermore, given the typi- subdivisions—the Euryarchaeota (Olsen et al. 1994; callyhighlysignificantsimilaritybetweenarchaealor- Pace1997).Nevertheless,theyshowsufficientdiversity thologs,itismostunlikelythatanysignificantnumber toallowus,forthefirsttime,toembarkonasystematic of them have been missed as a result of low sequence comparativeanalysisofarchaealgenomes.Wedescribe conservation. here the results of a detailed comparative analysis of Figure1showsthebreakdownofthearchaealpro- thefourcompleteeuryarchaealproteinsets.Ourprin- teinsetintermsoftheirconservationinthefourcom- cipal approach included the delineation of sets of or- plete genomes. The majority of the proteins in each thologousgenesandexaminationofphylogeneticpat- species—from 58% for P. horikoshii to 71% for M. jan- terns in these families (Tatusov et al. 1997; Koonin et naschii—belong to the archaeal families of likely or- al.1998). thologs(COGs),andanothersizablefraction(from7% RESULTS AND DISCUSSION Orthologous Families Delineated by Comparison of Four Euryarchaeal Genomes and the Principal Types of Events in ArchaealEvolution The proteins encoded in the genomes of the four eu- ryarchaealspeciescompriseaverygoodsetforthede- lineation of families of likely orthologs [designated clustersoforthologousgroups(ofproteins),COGs;Ta- tusov et al. 1997)]. In the original COG analysis, we emphasized that to use consistency between different genomes to support the derivation of COGs, the se- quences of the compared proteins should be maxi- mallyindependent;therefore,thiscriterionworksbest Figure 1 Conserved families and unique proteins encoded in with phylogenetically distant genomes. At large phy- thefourcompletearchaealgenomes.(COGs+)Distanthomologs logenetic distances, however, correct identification of of COGs; (NARCHOM) nonarchaeal homologs (only); (unique) proteinswithoutdetectablehomologsinotherspecies(fordetails COGs may be hampered by other problems, such as see text); (Af) Archaeoglobus fulgidus; (Ph) Pyrococcus horikoshii; difficulty in distinguishing orthologs from paralogs, (Mt)Methanobacteriumthermoautrophicum;(Mj)Methanococcus and in some cases, very low similarity between or- jannaschii. Genome Research 609 www.genome.org Downloaded from genome.cshlp.org on January 7, 2023 - Published by Cold Spring Harbor Laboratory Press Makarova et al. most likely that the ancestral form also had been au- totrophic; thus, the absence of the representatives of many COGs in P. horikoshii is best explained by lin- eage-specificgeneelimination.Atleastsomeofthear- chaeal COGs with two members are also likely to re- flectgeneloss.Thus,ahigherestimateforthenumber ofancestralgeneslostineachgenomecanbeobtained byaddingupallCOGswiththreeortwomembersthat donotincludethegivenspecies.Theresultvariesfrom atotalof220genesforM.jannaschiito451genesforP. horikoshii. Thus,theanalysisoftheconservedarchaealfami- lies reveals major genome plasticity, with only a mi- nority of families represented in all genomes. These Figure 2 Representation of the four archaeal species in the observationsmakeallthemorepertinentthequestion: COGs.(F)Archaeoglobusfulgidus;(T)Methanobacteriumthermo- which essential cellular functions are provided by the autrophicum;(J)Methanococcusjannaschii;(H)Pyrococcushoriko- shii. setof543universalarchaealCOGsandwhicharenot represented by it, and, accordingly, are performed by nonorthologous (unrelated or paralogous) proteins in different species—the phenomenon described as non- forM.jannaschiito11%forA.fulgidus)wereidentified orthologous gene displacement (Koonin et al. 1996a; asdistanthomologsoftheCOGs.Amongtheremain- MushegianandKoonin1996). ingproteinsthathadnoarchaealhomologs,forarela- tivelysmallfraction(from1%inM.jannaschiito4%in The Core Set of Conserved Euryarchaeal Genes, A.fulgidus),homologsweredetectedinothertaxa(pri- Lineage-Specific Gene Loss, and Nonorthologous marilybacteria),andtherest(~ 20%)hadnodetectable GeneDisplacement homologs. This distribution suggests that a conserved TheCOGsrepresentedinallfoureuryarchaealspecies archaeal gene set does exist. This core gene set, how- aresignificantlyenrichedinproteinsthatareinvolved ever,includesaminorityofthearchaealgenesasindi- in genome expression, compared to the entire collec- cated by the fact that only 543 of the 1326 identified COGs (40%) are represented in all four archaeal spe- cies; the remaining COGs are roughly equally divided betweenthosethatincludethreeandtwospecies(Fig. 2).TheuniversalarchaealCOGsencompass31%–35% of the proteins encoded in each of the individual ge- nomes.Thisnumberappearstobeanimportantmea- sureoftheevolutionarystabilityofthegenomes—the rest of the gene complement in each of the archaea must have been subject to evolutionary events other thanverticalinheritance,suchasduplicationwithsub- sequentrapiddivergence,horizontalgenetransfer,and lineage-specificgeneloss. These results provide at least a rough estimate of thelikelyamountofgenelossineachspecies,aswell as the number of COGs represented in the ancestral euryarchaeon. A conservative estimate of the number Figure3 Distributionofpredictedproteinfunctionsintheuni- versalandnonuniversalsubsetsofthearchaealCOGs.(Bluebars) of genes that might have been lost in each genome is Theuniversalsubset(543COGswithfourmemberseach);(red provided by the number of COGs that include three bars)thenonuniversalsubset(783COGswithtwoorthreemem- archaealspeciesotherthanthegivenone.Thisnumber bers each). (Vertical axis) Number of COGs; (horizontal axis) isintherangeof50to70forM.jannaschii,M.thermo- functionalcategories:1,translation,ribosomestructure,andbio- genesis;2,transcription;3,DNAreplication,repair,recombina- autotrophicum,andA.fulgidus,asopposedto206inP. tion; 4, energy production and methanogenesis; 5, amino acid horikoshii (Fig. 2). The greatest number of COGs that metabolism;6,nucleotidemetabolism;7,carbohydratemetabo- arenotrepresentedinP.horikoshiiisnotsurprisingas lism;8,coenzymemetabolism;9,lipidmetabolism;10,molecu- larchaperonesandrelatedfunctions;11,cellwallbiogenesisand it is a heterotrophic organism that lacks a number of celldivision;12,secretionandmotility;13,inorganiciontrans- biosynthetic capabilities (Gonzalez et al. 1998). The port; 14, general functional prediction only; 15, no functional majority of the archaea are autotrophs and it seems prediction. 610 Genome Research www.genome.org Downloaded from genome.cshlp.org on January 7, 2023 - Published by Cold Spring Harbor Laboratory Press Archaeal Genomics and Evolution tion of the archaeal COGs. In particular, most of the routeforprolinebiosynthesisinM.jannaschiiappears basic components of the translation, transcription, to be through the deacetylation of N-acetylglutamate and replication systems are conserved consistently in g-semialdehyde into g-glutamic semialdehyde, fol- allfourspecies;thesameistrueofanumberofproteins lowed by its conversion into pyrroline-5-carboxylate implicatedinrepairandrecombination(Fig.3). and then to proline as shown for bacteria and yeast In other functional categories of genes, the ge- (AdamsandFrank1980).M.jannaschiiencodesanor- nomeplasticityrevealedbyCOGanalysisismorepro- thologoftheN-acetylornithinedeacetylase(ArgE)that nounced.Becauseoftheapparentlossofanumberof catalyzesthefirststepofthispathway.Thesecondstep biosynthetic pathways in the heterotrophic P. horiko- of the pathway, conversion of g-glutamic semialde- shii,therearerelativelyfewmetabolicenzymesamong hyde to pyrroline-5-carboxylate, occurs spontane- the all-archaeal COGs, and in fact, it does not seem ously. However, the ortholog of the bacterial enzyme possible to delineate even a single metabolic pathway forthelaststepofprolinebiosynthesis,namelypyrro- that would be completely orthologous in all four ar- line-5-carboxylate reductase (ProC), is not encoded in chaea(Table1).Amongthethreeautotrophicspecies, the M. jannaschii genome and should have been dis- most of the steps of the central pathways are repre- placed by another dehydrogenase that remains to be sented by orthologs; nevertheless, almost each path- identified experimentally. Remarkably, A. fulgidus en- way has at least one step where nonorthologous dis- codesonlytheArgEorthologandM.thermoautotrophi- placement is likely (Table 1). The biosynthesis of cumonlytheProCortholog.Itappearsthatinthiscase, branchedchainaliphaticaminoacids(leucine,isoleu- weobservenonorthologousdisplacementofanentire cine,valine)isanexampleofacomplexpathwaythat (albeit short) pathway whereby acquisition of the or- is,initsentirety,representedbyorthologsinthethree nithinecyclodeaminasegenebyA.fulgidusandM.ther- autotrophicarchaeaaswellasinmostbacteria.Thisis, moautotrophicumhasmadetheenzymesoftheoriginal however,anexceptionratherthantheruleamongthe pathwayofprolinebiosynthesisdispensable. archaealmetabolicpathways—fewofthemconsistex- In addition to the cases of apparent nonortholo- clusively of orthologs of bacterial enzymes. In most gousdisplacement,thereareseveralimportantgapsin pathways,atleastoneortworeactionsarepredictedto ourunderstandingofmetabolicpathwaysinalleuryar- becatalyzedeitherbyknownarchaea-specificenzymes chaeota. The archaeal version of sugar metabolism is orbyyetuncharacterizedones(Table2).Inthereadily particularly puzzling. There is no doubt that autotro- detectablecasesofnonorthologousgenedisplacement, phic archaea possess the capabilities to synthesize ri- oneofthealternativesolutionsisfrequentlybasedon bose, deoxyribose, and the sugar components of the orthologsoftherespectivebacterialenzymes,whereas cellenvelope.Itisunclear,however,howtheyaccom- theotheroneseemstobeuniqueforarchaeaandisnot plishthisintheabsenceofaldolase,fructosebisphos- always identifiable. This is, for example, the situation phatase, transaldolase, transketolase, and pentose-5- with a critical reaction in glycolysis, namely the for- phosphate 3-epimerase (see Table 1). Genes for all mation of pyruvate from phosphoenolpyruvate. M. these enzymes are missing in M. thermoautotrophicum jannaschii and P. horikoshii encode an ortholog of the andA.fulgidus,whereasM.jannaschiihasgenescoding bacterial pyruvate kinase that is predicted to catalyze forthethreelatterenzymesbutnottheformertwo.It this reaction. Pyruvate kinase, however, is not detect- appears that compared to bacteria, the archaeal sugar able in the other two archaea. Given that the other metabolism shows systematic nonorthologous dis- components of the trunk portion of the glycolytic placement of enzymes. Interestingly, one of the ar- pathwayarepresentandthatthereactioncatalyzedby chaeal COGs includes predicted aldolases that are pyruvatekinaseisindispensableforthecompletionof highly conserved in all four archaea and are ortholo- glycolysis, nonorthologous displacement must be in- gous to the recently identified class I fructose- voked.Themostlikelydisplacingenzymeisphospho- biphosphate aldolase from E. coli (Thomson et al. enolpyruvate synthase, which is conserved in all ar- 1998).Therearetwoparalogousrepresentativesofthis chaea and might produce pyruvate by reversing its familyofaldolasesinM.jannaschiiandA.fulgidusand typicalreaction. only one member in M. thermoautotrophicum and Nonorthologousgenedisplacementisnotablealso P.horikoshii(Table1).Theseenzymesarelikelytocata- in the archaeal amino acid metabolism. For example, lyze key reactions both in pentose and in hexose bio- different archaeal species apparently use radically dif- synthesis;theexactpathwaysremaintobestudiedex- ferentpathwaystosynthesizeproline.InM.thermoau- perimentally. totrophicumandA.fulgidus,prolinecanbeformedfrom Archaeal COGs that contain four or three mem- ornithine in a single reaction catalyzed by ornithine bersaccountforthemajorityofknownhousekeeping cyclodeaminase(Sansetal.1988).M.jannaschiiandP. functions, with several notable exceptions (e.g., those horikoshii lack this enzyme, and while the latter is ex- inthetranslationmachinerydiscussedabove),andin pected to be a proline auxotroph, the only possible a sense, may be considered an idealized minimal ar- Genome Research 611 www.genome.org Downloaded from genome.cshlp.org on January 7, 2023 - Published by Cold Spring Harbor Laboratory Press Makarova et al. Table1. OrthologousandNonorthologousMetabolicPathwaysandEnzymesinArchaea Nonorthologous Orthologs Genes genedisplacement: ofbacterial missing orthologsof genesfound inall bacterialgenes inallfour four foundonly Enzymes(genes) archaeal archaeal insomeofthe Consequencesfor Pathway inthepathwaya genomes genomes archaealgenomes archaealmetabolism Glycolysis hexokinase(glk),phosphogluco- dhnA,tpi, glkpfkA pgiispresentonlyin bacterial-typehexokinase isomerase(pgi), gapA,pgk, Mj;pykAispresent andphosphofructokinase phosphofructokinase(pfkA), pgmd,eno inMjandPh,not areapparentlydisplaced aldolase(fba/dhnA), foundinMtandAf bynonorthologous triosephosphateisomerase ADP-dependentenzymes; (tpi),glyceraldehyde thelackofpyruvate 3-phosphatedehydrogenase kinaseinMtandAfis (gapA),3-phosphoglycerate probablycompensated kinase(pgk),phosphoglycero- byphosphoenolpyruvate mutase(pgm/yibO),enolase synthaseworkinginthe (eno),pyruvatekinase(pykA) reversedirection. Gluco- phosphoenolpyruvatesynthase ppsA,eno, fbp pgiispresentonlyin otherphosphohexomutases neogenesis (ppsA),enolase(eno),phos- pgm,pgk, Mj,notfoundin (e.g.,phosphomannomu- phoglyceromutase(pgm), gapA,tpi, Mt,Af,andPh tase)areprobablyused 3-phosphoglyceratekinase dhnA forpolysaccharide (pgk),glyceraldehyde3-phos- biosynthesisinMt,Af, phatedehydrogenase(gapA), andPh. triosephosphateisomerase (tpi),aldolase(fba/dhnA), fructosebisphosphatase(fbp), phosphoglucoisomerase(pgi) Pentose glucose-6-phosphatedehydro- rpiA zwf, tktAgeneissplitin thispathwayisnotfunc- phosphate genase(zwf),6-phosphoglu- gnd Mj,absentinMt, tionalinanyofthese shuntand conatedehydrogenase(gnd), Af,andPh;talAand archaea.Themechanism pentose transketolase(tktA);trans- yhfDarepresentin ofpentosephosphate biosynthesis aldolase(talA),pentose-5- Mj,notinMt,Af,or biosynthesisisnotclear. phosphate-3-epimerase(yhfD), Ph;deoCispresent ApredictedDhnA-type ribose5-phosphateisomerase onlyinMt aldolasethatishighly (rpiA),deoxyribose-phosphate conservedinallfour aldolase(deoC) archaea(MJ0400, MJ1585;MTH579; AF0108,AF0230; PH0082)maycatalyze theformationofribose fromglyceraldehyde- 3-phosphateandacetal- dehyde.Alternatively,in Mt,thisreactionmight becatalyzedbytheDeoC ortholog(MTH818). Entner– glucose-6-phosphate edd zwf, archaealacktheclassical Doudoroff dehydrogenase(zwf), eda Entner–Doudoroff pathwayb,c 6-phosphogluconate path-wayandinstead dehydratase(edd), appeartopossessa 2-keto-3-deoxy-6-phospho- modified,nonphos- gluconatealdolase(eda) phorylatedver-sion.In Mj,Af,andMt,members oftheabovenewfamily ofaldolasesmayfunction inthispathwayas2-keto- 3-de-oxygluconatealdo- lase(anonorthologous displace-mentofEda). Thearchaealgluconate dehydrataseremains unknown,whichpre- cludesacompleterecon- structionofthispathway. TCAcyclec citratesynthase(gltA),aconitase acnA,icd, sucA, gltAispresentinMt MtandAfcanreduce (acnA),isocitrate sucC,sucD, sucB andAf,butnotin a-ketoglutaratetocitrate dehydrogenase(icd), frdA,frdB, MjorPh;Phhas andfurthertosuccinate 612 Genome Research www.genome.org Downloaded from genome.cshlp.org on January 7, 2023 - Published by Cold Spring Harbor Laboratory Press Archaeal Genomics and Evolution Table1. (Continued) Nonorthologous Orthologs Genes genedisplacement: ofbacterial missing orthologsof genesfound inall bacterialgenes inallfour four foundonly Enzymes(genes) archaeal archaeal insomeofthe Consequencesfor Pathway inthepathwaya genomes genomes archaealgenomes archaealmetabolism a-ketoglutaratedehydro- fumA,mdh fumAand,possibly, orsuccinyl-CoA;inMj genase(sucA,sucB),succinyl- acnAgenes onlythepartfrom CoAsynthase(sucC,sucD), oxaloacetateto fumaratereductase(frdA,frdB), succinyl-CoAisoperative. fumarase(fumA),malate dehydrogenase(mdh) Purinebio- phosphosphoribosylpyro- prsA,purF, purK, purTispresentin allfourarchaeaare synthesis phosphatesynthase(prsA), purD,purL, purH2 MjandPhbutnot probablycapableof amidophosphoribosyl- purM, inMtandAf; purinesynthesisdenovo; transferase(purF),GAR purE,purC, purH1ispresent carboxylationofAIR synthase(purD),GARtrans- purB,purA, onlyinAf;guaB is probablyoccurs formylase(purN/purT),FGAM guaA presentinMj,Mt, spontaneously.Thestill synthase(purL),AIRsynthase andPh,butnotin unidentifiedenzymesthat (purM),NCAIRsynthase AF catalyzeformylationof (purK),NCAIRmutase(purE), SAICARandAICARinall SAICARsynthase(purC), fourarcheaeaand adenylosuccinatelyase(purB), formylationofGARinMt AICARtransformylase(purH2), andAfapparentlyuse IMPcyclohydrolase(purH1), formateandATPas adenylosuccinatesynthase substrates.IMP (purA),IMPdehydrogenase dehydrogenaseismissing (guaB),GMPsynthase(guaA), AFandisprobably displacedbya nonorthologous dehydrogenase. Pyrimidine carbamoylphosphatesynthase pyrB,ygeZ, none carAandcarBare Mj,Mt,AfandprobablyPh biosynthesis (carA,carB),aspartate pyrD,pyrE, missinginPh arecapableofpyrimidine carbamoyltransferase(pyrB), pyrF,pyrH, synthesisdenovo.The dihydroorotase(pyrC/ygeZ), ndk,pyrG identityofcarbamoyl- dihydroorotatedehydrogenase phosphatesynthaseinPh (pyrD),orotatephosphoribo- remainsunclear. syl-transferase(pyrE),oroti- dine-5’-phosphatedecarboxy- lase(pyrF),UMPkinase(pyrH), NDPkinase(ndk),CTP synthase(pyrG) Histidine phosphosphoribosylpyro- prsA,hisG, none hisI2ispresentinMj Mj,Mt,andAfareall biosynthesisc phosphatesynthase(prsA), hisI1,hisA, andMt,butnotin capableofhistidine ATP-phosphoribosyltransferase hisH,hisF, Af;hisB1isfound biosynthesis;phospho- (hisG),phosphoribosyl-ATP hisB2,hisC, onlyinMj;prsA ribosyl-ATPpyrophospha- pyrophosphatase(hisI2), hisD andhisCgenesare taseinAfisprobably phosphoribosyl-AMPcyclo- presentinPh displacedbysomeother hydrolase(hisI1),58-ProFAR ATP/ADPase;allarchaea isomerase(hisA),imidazole- encodedistanthomo- glycerolphosphatesynthase logsofyeasthistidinol (hisH,hisF),imidazoleglycerol phosphatase(HDsuper- phosphatedehydratase familyhydrolases;Aravind (hisB2),histidinolphosphate andKoonin1998b),one aminotransferase(hisC), ofwhichmightdisplace histidinolphosphatase(hisB1), HisB1inAfandMt. histidinoldehydrogenase(hisD) Branched threoninedeaminase(ilvA), ilvA,ilvB, none none allenzymesofleucine, chain acetohydroxyacidsynthase ilvN,ilvC, isoleucine,andvaline aminoacids (ilvB,ilvN),acetohydroxyacid ilvD,leuA, biosynthesisinbacteria, bio- isomeroreductase(ilvC), leuC,leuD, archaea,andyeastare synthesisc dihydroxyaciddehydratase leuB,ilvE orthologous. (ilvD),2-isopropylmalate synthase(leuA), isopropylmalateisomerase (leuC,leuD), 3-isopropyl-malate Genome Research 613 www.genome.org Downloaded from genome.cshlp.org on January 7, 2023 - Published by Cold Spring Harbor Laboratory Press Makarova et al. Table1. (Continued) Nonorthologous Orthologs Genes genedisplacement: ofbacterial missing orthologsof genesfound inall bacterialgenes inallfour four foundonly Enzymes(genes) archaeal archaeal insomeofthe Consequencesfor Pathway inthepathwaya genomes genomes archaealgenomes archaealmetabolism dehydrogenase(leuB), glutamatetransaminase(ilvE) Aromatic 3-deoxyheptulosonate aroD,aroE, aroG, none themechanismof aminoacids 7-phosphatesynthase(aroG/ aroA,aroC, aroB, 3-dehydroquinate biosynthesisc kdsA),3-dehydroquinate pheA1, aroK synthesisremainsunclear; synthase(aroB),3-dehydro- pheA2, shikimatephosphorylation quinatedehydratase(aroD), tyrA2,tyrB, inallautotrophicarchaea shikimatedehydrogenase trpD1, isprobablyperformedby (aroE),shikimatekinase(aroK), trpE, anarchaea-specifickinase. 5-enolpyruvoylshikimate trpD2, 3-phosphatesynthase(aroA), trpC2, chorismatesynthase(aroC), trpC1, chorismatemutase(pheA1), trpA,trpB prephenatedehydratase (pheA2),prephenate dehydrogenase(tyrA2), tyrosineaminotransferase (tyrB),antranilatesynthase (trpD1,trpE),antranilate phosphoribosyl-transferase (trpD2),phosphoribosylan- tranilateisomerase(trpC2), indole-glycerolphosphate synthase(trpC1),tryptophan synthase(trpA,trpB) Threonine aspartokinase(thrA1),aspartate thrA1,asd, none thrBispresentinMj inMtandAf,homoserine biosynthesis semialdehydedehydrogenase thrA2,thrC andPh,butnotin kinaseisprobably (asd),homoserinedehydroge- MtandAf displacedbyadifferent nase(thrA2),homoserine kinase. kinase(thrB),threonine synthase(thrC) Methionine aspartokinase(metL1),aspartate metL1,asd, metA, metB/metCisfound inallfourarchaea,thethree biosynthesis semialdehydedehydrogenase metL2, onlyinPh,notin stepsleadingfrom (asd),homoserinedehydro- metE Mj,Mt,orAf homoserineto genase(metL2),homoserine homocysteineare transsuccinylase(metA), probablydisplacedbya cystathionineg-synthase singlereactioncatalyzed (metB),b-cystathionase byasulfurtransferase. (metC),methioninesynthase (metE/metH) Arginine acetylglutamatesynthase argA2,argB, none argEispresentinMj, inMt,acetylornithinaseis biosynthesis (argA2),acetylglutamate argC, Af;andPh,butnot probablydisplacedbya kinase(argB),acetylglutamate argD,argF, inMt;argGispres- differentacetyltransferase. phosphatereductase(argC), argH entinMj,Mtand acetylornithineaminotrans- AfbutnotinPh ferase(argD),acetylornithinase (argE),ornithinecarbamoyl- transferase(argF),argininosuc- cinatesynthase(argG), argininosuccinatelyase(argH) NAD aspartateoxidase(nadB), nadB,nadC, none nadAispresentinMj, nadAisprobablydisplaced biosynthesis quinolinatesynthase(nadA), nadE Mt,andPh,but inAFbyadifferent quinolinatephosphoribosyl- notinAF enzyme;nadDgene transferase(nadC),nicotinic (predictedE.coliyneB) acidmononucleotideadenylyl- remainsunidentifiedin transferase(nadD),deamido- bacteriaandarchaea. NADammonialigase(nadE) Riboflavin GTPcyclohydrolaseII(ribA), ribD2,ribB, ribC ribAandribD1are ribCisdisplacedbyan biosynthesisc pyrimidinedeaminase(ribD1), ribE presentinAfbut archaea-specificriboflavin pyrimidinereductase(ribD2), notinMjorMt synthase(Eberhardtetal., 614 Genome Research www.genome.org Downloaded from genome.cshlp.org on January 7, 2023 - Published by Cold Spring Harbor Laboratory Press Archaeal Genomics and Evolution Table1. (Continued) Nonorthologous Orthologs Genes genedisplacement: ofbacterial missing orthologsof genesfound inall bacterialgenes inallfour four foundonly Enzymes(genes) archaeal archaeal insomeofthe Consequencesfor Pathway inthepathwaya genomes genomes archaealgenomes archaealmetabolism 3,4-dihydroxybutanone-4- 1997);themechanismof phosphatesynthase(ribB), 2,5-diamino-6-ribosyl- 6,7-dimethyl-8-ribityllumazine amino-4-pyrimidone synthase(ribE),riboflavin 58-phosphateformation synthase(ribC) MjandMtremains unclear. Siroheme Glutamyl-tRNAreductase hemA,hemL, none none allenzymesofsiroheme biosynthesisc (hemA),glutamate hemB, biosynthesisinarchaea 1-semialdehydeaminotrans- hemC, areorthologousto ferase(hemL),probilinogenIII hemD, bacterialones. synthase(hemB),hydroxy- cysG2, methylbilanesynthase(hemC), cysG1 uroporphyrinogenIIIsynthase (hemD),uroporphirinogen methyltransferase(cysG2), dimethyluroporphirinogenIII dehydrogenase(cysG1) Cobalamin uroporphyrinogenIIImethylase cysG2,cbiH, cobA, cbiL,cbiJ,cbiT,and inMj,Mt,andAf,cobAand biosynthesisc (cysG2),precorrin-2methylase cbiF,cbiE, cobD cobNarepresent cobDgeneproductsare (cbiL),precorrin-3Bmethylase cbiC,cbiA, inMjandMtbut probablydisplacedby (cbiH),precorrin-4methylase cbiP,cbiB, notinAf;cobTis archaea-specificadenosyl- (cbiF),precorrin-6Areductase cobS foundonlyinAf andaminotransferases, (cbiJ),precorrin6Bmethylase respectively;itisnotclear (cbiE),precorrin6Bdecarbox- whetherthispathwayis ylase(cbiT),precorrin-8x functionalinAf. isomerase(cbiC),cobyrinic acida,c-diamidesynthase (cbiA),cobaltinsertionprotein (cobN),cob(I)alaminadenosyl- transferase(cobA),cobyricacid synthase(cbiP),cobyricacid aminotransferase(cobD), cobinamidesynthase(cbiB), nicotinate-nucleotide:dimethyl- benzimidazolephosphoribosyl- transferase(cobT),cobalamin synthase(cobS) Biotin pimeloyl-CoAsynthetase(bioW)e, none none alltheenzymesof probablyonlyMjiscapable biosynthesis 7-keto-8-aminopel- thepathwayare ofbiotinbiosynthesis. argonatesynthetase(bioF), presentinMjbut 7,8-diaminopelargonate onlyoneortwo aminotransferase(bioA), enzymescanbe dethiobiotinsynthetase(bioD), foundinMt,Af, biotinsynthetase(bioB), andPh biotin-[acetyl-CoAcarboxylase] holoenzymesynthetase(birA) aThegenesandpathwaysfollowthebiochemicaldataandnomenclaturedescribedforE.coliandS.typhimurium(Neidhardtetal. 1996).Genescodingformultidomainproteinswithmorethanoneenzymaticactivityaredividedintoseparatedomains,startingfrom theamino-terminaldomain.Knowncasesofnonorthologousgenedisplacementsareindicatedwithaslash;genesencodingdifferent subunitsofthesameenzymeareseparatedbycommas. b6-Phosphogluconate dehydratase gene (edd) is apparently present in Mj, Mt, and Af, but these ORFs probably function as dihy- droxyaciddehydratases(ilvD),whichisaparalogofedd. cMostenzymesofthesepathwaysarenotencodedinthePyrococcushorikoshiigenome;exceptionsareindicated. dOrthologsofB.subtilispgmgene,correspondingtotheE.coliyibO. eOrthologsofB.subtilisgenebioW,notfoundinE.coli. Genome Research 615 www.genome.org Downloaded from genome.cshlp.org on January 7, 2023 - Published by Cold Spring Harbor Laboratory Press Makarova et al. Table2. SynapomorphiesinEuryarchaeota(Examples) Representatives Uniquesharedcharactersnonarchaeal COGdescription Mj Mt Af Ph homologs(seealsoFig.8) DNApolymeraseII MJ1630 MTH1536 AF1722 PHBN021 ahighlyconservedarchaealenzymewithout largesubunit similaritytoanyotherproteins,withthe exceptionofaC4Znfingerresemblingthose ineukaryoticDNApolymerased. DNApolymeraseII MJ0702 MTH1405 AF1790 PHBN023 Predictedactivephosphohydrolase(phosphatase) smallsubunit, PHAZ021 inarchaeaandaninactivatedformin predicted eukaryotes(AravindandKoonin1998a). phosphohydrolaseof thecalcineurin-like superfamily PredictedATP-depen- MJ0414 MTH1221 AF0849 PHBG013 verylimitedsimilaritytootherATP-dependent dentDNAligase DNAligasesexceptforonefromAquifex aeolicus,probablyduetohorizontaltransfer (AltschulandKoonin1998). DNAexcisionrepair MJ1505 MTH1415 AF0358 PHAI012 consistsofatypicalhelicasedomainanda enzyme nucleasedomainasopposedtotheapparent eukaryoticorthologs(ERCC4/RAD1)inwhich thehelicasedomainappearstobeinactivated (Aravindetal.1999). DnaG-typeprimase-like MJ1206 MTH891 AF1899 PHAN003 uniquedomainorganizationwithaN-terminal proteins helicasemotifcombinedwiththeDnaG-type (Toprim)domain(Aravindetal.1998). DNA-directedRNA MJ0396/ MTH264/ AF1116/ PHBT008/ twosingledomain(S1andC4Znfinger polymerasesubunit MJ0397 MTH265 AF1117 inPH744 domains)proteinsinallEuryarchaeota;afusion (E8/E9) inSulfolobus;onlytheS1domainproteinisa RNApolymerasesubunitineukaryotes(Fig.8). DNA-dependentRNA MJ1042/ MTH1051/ AF1888/ PHCB020/ thesplitofthelargestRNApolymerasesubunit polymeraseA8/A9 MJ1043 MTH1052 AF1889 PHCB021 geneintotwoadjacentgenesisuniqueto subunits archaea.Botheukaryotesandbacteriaencode highlyconservedorthologsofthearchaealA8 andA9subunitsasasinglepolypeptide(the b8-subunitinbacteria). PredictedHTH MJ0188 MTH1282 AF1259 PHCN020 uniquedomainorganization:twoCBSdomains transcriptional fusedwithanHTHdomain. regulators Archaeosinesynthetase MJ1022 MTH1665 AF0587 PHBN035 two-domainarchitecture,withanadditional, (archaea-specific predictedRNA-bindingPUAdomain,as tRNAmodification) opposedtobacterialhomologs(queuine synthetases)thatconsistoftheenzymatic domainalone(Fig8;AravindandKoonin 1999a). Translationelongation MJ0459 MTH1699 AF0574 AP000001 thearchaealEF-1bisasmallproteinof~ 120 factorEF-1b aminoacidswhereasalleukaryotichomologs (orthologs?)containanadditionaldomain homologoustoGSTs(Kooninetal.1994). ATP-dependent MJ1417 MTH785 AF0364 PHBH031 onlythecarboxy-terminal,proteasedomainis proteaseLon highlyconservedinarchaeaandbacteria;the amino-terminalATPasedomaininthearchaeal proteinsisdistinctfromtheATPasedomainof Lon. PilTfamilyATPase MJ1533 MTH246 AF1951 PHBP012 uniquedomainorganization—ATPase+ amino-terminalPINdomain. GMPsynthetase MJ1131 MTH710 AF0253 PHAU017 ATPase(toprow)andglutamineamidotransferase subunits—PP-family (bottomrow)moietiesoftheGMPsynthetase ATPasesand areseparatepolypeptides.Orthologsofeach MJ1575 MTH709 AF1320 PHAU016 glutamine subunitinbacteriaandeukaryotesaredomains amidotransferase ofasingleprotein. Predictedenzymewith MJ0202 MTH1744 AF1104 PHBQ042 orthologwiththesamedomainarchitectureonly anATP-grasp inAquifex;allotherhomologsaredistantly domainandaredox relatedandlacktheredoxcenter. activecenter 616 Genome Research www.genome.org Downloaded from genome.cshlp.org on January 7, 2023 - Published by Cold Spring Harbor Laboratory Press Archaeal Genomics and Evolution chaeal gene complement. The COGs with two mem- bers appear to account for more specific functions linked to the organism’s particular life style, for ex- ample, a number of COGs that include enzymes in- volvedinmethanogenesisinM.jannaschiiandM.ther- moautotrophicum. Relationships Between Euryarchaeal Protein Families and Their Bacterial and EukaryoticHomologs The majority of the archaeal COGs have homologs in other taxa. In the present analysis, we attempted to distinguish carefully between true orthology (see Figure5 Relationshipbetweenmembersoftheuniversaland nonuniversal subsets of the euryarchaeal COGs from M. jan- Methods) and other homologous relationships that naschii and A. fulgidus and their bacterial and eukaryotic ho- typicallyincludeweaksequenceconservationordiffer- mologs(1)M.jannaschii,theuniversalsubset;(2)M.jannaschii, ences in domain architectures. There are notable dif- thenonuniversalsubset;(3)A.fulgidus,theuniversalsubset;(4) ferences in the distribution of the apparent phyloge- A.fulgidus,thenonuniversalsubset.(Bacterial)Reliablebesthits tobacterialproteins;(eukaryotic)reliablebesthitstoeukaryotic neticaffinitiesfortheCOGsrepresentedinallarchaea proteins.Areliablebesthitwasdefinedasonewithane-valueat (universal) and those that include only three or two least10000timeslowerthanthatfortheotherdivisions(eukary- archaeal species. For >50% of the universal archaeal oticorbacteria,respectively).Onlythehitswithe-values<0.001 COGs, orthologs were identified in both bacteria and wereanalyzed.(Redbars)Bacterial;(bluebars)eukaryotic;(yel- lowbars)uncertain. eukaryotes, in a sharp contrast to the nonuniversal COGs for which this fraction comprised of only 28% (Fig. 4A,B). A significant majority of the COGs that difference might reflect true phylogenetic affinities, have only bacterial orthologs are not conserved in all differenceinevolutionaryratesindifferentfunctional archaea,whereasmostoftheCOGsthathaveonlyeu- categories of proteins, or both. However, the finding karyotic orthologs belong to the universal subset (Fig. that COGs consisting of two to three euryarchaeal 4A,B).Furthermore,thoseCOGsthatdonothaveany memberstypicallyshowagreatersimilaritytobacterial homologs outside the archaea are poorly represented homologs,mightsuggestasignificantcontributionof intheuniversalsubset. horizontaltransferofbacterialgenesintoarchaea. Acomplementary,quantitativeanalysisofthedis- The functional distinction between bacterial and tributionofsequencesimilaritiessupportstheseobser- eukaryotic COGs in archaea is clear-cut and is related vations.ArchaealproteinsfromtheCOGsthatinclude tothefunctionaldifferencebetweentheuniversaland only two or three species typically show the greatest specialized subsets discussed above (see Fig. 3). The similaritytobacterialhomologs,incontrasttotheuni- bacterial COGs within the universal subset comprise versalCOGsthataresignificantlyenrichedinproteins primarilyproteinsinvolvedinenergyproduction(e.g., mostsimilartotheeukaryotichomologs(Fig.5).This ferredoxins and numerous components of hydrog- enasecomplexes),certainmetabolic functions, such as coenzyme bio- synthesis, and transport system components.Interestingly,thisbac- terial set also includes enzymes in- volved in protein degradation and potentially in chaperone-like func- tions, such as three families of pre- viously undetected predicted zinc- dependent proteases (K.S. Maka- rova, L. Aravind, and E.V. Koonin, unpubl.).Furthermore,thebacterial component of the universal COG subset includes several repair en- zymes, proteins involved in cell di- vision, for example, chromosome partitioning ATPases and stress re- sponse proteins, such as the ho- Figure4 Taxonomicdistributionofnonarchaelhomologsforuniversalandnonuniversal mologs of the bacterial universal subsetsofthearchaelCOGs.(A)Theuniversalsubset(543COGswithfourmemberseach); stressproteinUspA. (B)thenonuniversalsubset(783COGswithtwoorthreememberseach). Genome Research 617 www.genome.org
Description: