ebook img

Evidence for lateral gene transfer between Archaea and Bacteria PDF

16 Pages·1999·4.44 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Evidence for lateral gene transfer between Archaea and Bacteria

articles Evidence for lateral gene transfer between Archaea and Bacteria from genome sequence of Thermotoga maritima KarenE.Nelson,RebeccaA.Clayton,StevenR.Gill,MichelleL.Gwinn,RobertJ.Dodson,DanielH.Haft, ErinK.Hickey,JeremyD.Peterson,WilliamC.Nelson,KarenA.Ketchum,LisaMcDonald,TeresaR.Utterback, JoelA.Malek,KatjaD.Linher,MinaM.Garrett,AshleyM.Stewart,MatthewD.Cotton,MatthewS.Pratt, CherylA.Phillips,DelwoodRichardson, JohnHeidelberg,GrangerG.Sutton,RobertD.Fleischmann, JonathanA.Eisen,OwenWhite,StevenL.Salzberg,HamiltonO.Smith,J.CraigVenter&ClaireM.Fraser TheInstituteforGenomicResearch,9712MedicalCenterDrive,Rockville,Maryland20850,USA ........................................................................................................................................................................................................................................................ The1,860,725-base-pairgenomeofThermotogamaritimaMSB8contains1,877predictedcodingregions,1,014 (54%)ofwhichhavefunctionalassignmentsand863(46%)ofwhichareofunknownfunction.Genomeanalysis revealsnumerouspathwaysinvolvedindegradationofsugarsandplantpolysaccharides,and108genesthathave orthologuesonlyinthegenomesofotherthermophilicEubacteriaandArchaea.OftheEubacteriasequencedtodate, T.maritimahasthehighestpercentage(24%)ofgenesthataremostsimilartoarchaealgenes.Eighty-one archaeal-likegenesareclusteredin15regionsoftheT.maritimagenomethatrangeinsizefrom4to20kilobases. ConservationofgeneorderbetweenT.maritimaandArchaeainmanyoftheclusteredregionssuggeststhatlateral genetransfermayhaveoccurredbetweenthermophilicEubacteriaandArchaea. Thermotogamaritima,anon-spore-forming,rod-shapedbacterium nucleotides, using the coding-analysis program GLIMMER9 (see belongingtotheorderThermotogales,wasoriginallyisolatedfrom Methods). Coding sequences cover 95% of the chromosome. geothermalheatedmarinesedimentatVulcano,Italy1,andhasan Predictedproteinsequencesweresearchedagainstanon-redundant optimum growth temperature of 80(cid:56)C. T. maritima metabolizes proteindatabaseandbiologicalroleswereassignedto1,014(54%) many simple and complex carbohydrates including glucose, sucrose, starch, cellulose and xylan1,2. Both cellulose and xylan, 1 1,800,000 through conversion to fuels (such as H ), have great potential as 100,000 2 renewablecarbonandenergysources. 1,700,000 200,000 T. maritima is also of evolutionary significance, because small- subunit ribosomal RNA (SSU rRNA) phylogeny has placed this 1,600,000 bacteriumasoneofthedeepestandmostslowlyevolvinglineagesin 300,000 theEubacteria3.Toelucidatefurtheritsuniquemetabolicproperties and evolutionary relationship to other microbial species, we 1,500,000 sequencedthegenomeofthetypestrainT.maritimaMSB8using 400,000 the whole-genome random-sequencing method previously described4,5. 1,400,000 500,000 Generalfeaturesofthegenome The genome of T. maritima is a single circular chromosome 1,300,000 consisting of 1,860,725 base pairs (bp) (Fig. 1) with an average 600,000 GþC content of 46%. A single rRNA operon (16S–23S–5S), containinganisoleucinetransferRNAandanalanyltRNA inthe 1,200,000 spacerregionbetweenthesmall-andlarge-subunitgenes,corresponds 700,000 totheoneregionofthechromosomewithasignificantlyhigherGþC 1,100,000 content(62%).AregionofsignificantlylowerGþCcontent(34%) 800,000 1,000,000 900,000 encodeslipopolysaccharidebiosynthesis(LPS)proteins. On the basis of analysis of GþC ratio, G–C skew6 Figure 1 Circular representation of the T. maritima MSB8 genome showing (G(cid:50)C=GþC) and asymmetric distribution of oligomers7 in the predicted-coding regions and other features. Outer circle, predicted protein- T.maritimagenome,wecouldnotidentifyacharacteristicbacterial codingregionsontheplusstrandclassifiedbyroleaccordingtothecolourcode origin of replication, a situation similar to that observed in the inFig.2(unknownsandhypotheticalsareinblack).Secondcircle,predicted genomesoftheArchaeaMethanococcusjannaschii8andArchaeoglobus protein-codingregionsontheminusstrand.Thirdcircle,(cid:120)2compositionin2000- fulgidus5.Weassignedbase-paironeofthegenomeatthebeginning bpwindows(seeMethods);bandscorrespondto(cid:120)2valueswithP(cid:60)1:9(cid:51)10(cid:50)9. ofthelongeststretch(2.6-kb)of30-bprepeats(Fig.2). Fourthcircle,Archaea-likeislandsonthegenome.Fifthcircle,smallrepeats.Sixth Open reading frames. We identified 1,877 open reading frames circle,largerepeats(black),largerepeatsassociatedwithsmallrepeats(red). (ORFs) (Figs 1, 2; Tables 1, 2), with an average size of 947 Seventhandeighthcircles,rRNAsandtRNAs,respectively. © 1999 Macmillan Magazines Ltd NATURE|VOL399|27MAY1999|www.nature.com 323 articles Table1GeneralfeaturesoftheT.maritimaMSB8genome ThepredominantmechanismoftransportinT.maritimaisATP- coupled solute flux. Eighty-four percent of the proteins in the Generalfeatures ............................................................................................................................................................................. transport category are subunits of ATP-binding cassette (ABC) Lengthofsequence 1,860,725 transporters.Phylogeneticcomparisons oftheperiplasmicsolute- G+Cratio 46% Totalno.ofsequences 30,140 binding protein (SBP) component (Fig. 4) roughly parallels the Averagereadlength(bp) 531 familiesdefinedinotherEubacteria12withamarkedexpansionof Openreadingframes 1,877 proteinsspecificforoligopeptides.Nineoftheelevenoligopeptide Proteincodingregions 95% Ribosomals 15S–16S–23S systems appear to be in operons with genes essential for sugar tRNAs 46(10clusters/19singlegenes) metabolism (Fig. 5). Of the published genomes, only Pyrococcus ............................................................................................................................................................................. Chromosomalcodingsequences horikoshii13hasanoligopeptidetransporterassociatedwithasugar ............................................................................................................................................................................. degradingenzyme((cid:98)-galactosidase).Thisoperonicstructuresug- No.similartoknownproteins 1,014 No.ofconservedhypotheticals 407 gests that there is coordinate regulation of peptide import with No.similartoproteinsofunknownfunction 83 sugardegradation inthesetwoorganisms,althoughitisfarmore No.withoutadatabasematch 373 extensive in T. maritima. This contrasts with classical regulatory Total 1,877 ............................................................................................................................................................................. networks where the transported substrate affects transcription of Repeats theABCtransportergenes14. ............................................................................................................................................................................. Class Length Copies Databasematch Almost7%ofthepredictedcodingsequencesintheT.maritima SR-01 30 143 tttccatacctctaaggaattattgaaaca genomeareinvolvedinmetabolismofsimpleandcomplexsugars, LR-01 1,897 2 hypotheticalprotein more than twice the percentage seen in other eubacterial and LR-02 1,403 2 (cid:97)-glucosidase archaealspeciessequencedtodate.Severalgenesencodingproteins LR-03 1,137 4 putativetransposase involvedinthesequentialdegradationofxylanarepresent.Genes LR-04 1,082 2 methyl-acceptingchemotaxisprotein LR-05 858 2 putativetransposase encoding endoglucanases (celA/B) and (cid:98)-glucosidases (bglB) that LR-06 555 2 helicase areinvolvedincellulosedegradationwereidentified,confirmingthe LR-07 252 2 excinuclease presenceofthecellulolyticsystemspredictedbybiochemicalstudies LR-08 241 2 putativetransposase ............................................................................................................................................................................. ofthisorganism15.T.maritimadoesnothaveacomplexsystemfor the degradation of plant cellulosic materials, as described for the ofthemusingtheclassificationschemeadaptedfromref.10.Four- thermophilic bacterium Clostridium thermocellum, in which the hundred-and-seven (22%) predicted coding sequences matched degradationofcellulosedependsonamultienzymecomplexknown hypotheticalcodingsequencesfromotherspecies,and373(20%) asthecellulosome,composedofbetween14and26subunits16. hadnodatabasematch.Forty-sixstabletRNAswithspecificityfor GlucosecatabolisminT.maritimainvolvestheEmbden-Meyer- all20aminoacidswereidentified. hof and Entner-Doudoroff glycolytic pathways. In addition, the Repeats.TheT.maritimagenomehas143copiesofa30-bprepeat non-oxidativebranchofthepentose-phosphatepathwayappearsto found in eight distinct clusters on the chromosome (Figs 1, 2). beinvolvedinglucosebreakdown.Genomeanalysisindicatesthat The 30-bp repeats are interspersed with a unique 39–40-bp T. maritima can metabolize glycerol, gluconate and numerous sequenceandarefollowedbya452-bpsequencetoformastructure sugars including amylose, maltose and galactose, as well as the identicaltothatdescribedforthegenomesofM.jannaschii8andA. amino acids aspartate, threonine and glycine (Fig. 3). CoA-SH- fulgidus5.OurexaminationoftheAquifexaeolicusgenome11reveals dependentferredoxinoxidoreductasesspecificforpyruvate,aswell thatsimilarrepeatsarepresent,buttheyarefewerinnumber,and aspartialandcompleteoperonsforferredoxinoxidoreductasesof shorter, than those in T. maritima. These small/large repeat unknownspecificity,arealsopresentonthegenome. structures have only been identified in the genome sequences of Biosynthetic pathways for nine amino acids were identified in thermophiles. T. maritima (Fig. 3). In addition, genes that encode proteins for Inaddition tothe30-bprepeatstructures,eightclassesoflarge the biosynthesis of biotin, folic acid, haem, porphyrin, lipoate, repeats,morethan200bpinlengthandwith(cid:46)95%identitytoeach menaquinone, ubiquinone, pantothenate and pyridoxine were other,arepresentintheT.maritimagenome(Fig.1,Table1). identified. T. maritima may also synthesize glycogen as a storage Multigenefamilies.Weidentified214genefamilies(P(cid:35)10(cid:50)5over polysaccharide. 60%ofthelengthofthesequence—seeMethods)inT.maritima.Of Along with an ability to gain energy through a fermentative these families, 126 consist of two members, and the largest gene metabolism, T. maritima can grow as a respiratory organism, family (of ATP-binding subunits of ABC transporters) contains generatingenergyinthepresenceofFe(III)(ref.17).Growthwith 67members.Twootherlargefamilies(onewith 22membersand sulphurastheterminalelectronacceptordoesnotproduceATP18, one with 47 members) consist exclusively of proteins involved in transport. Fifteen families, unique to T. maritima, consist of Figure2LinearrepresentationoftheT.maritimaMSB8genome.Thelocationsof (cid:81) proteins with no database match. Eleven of these families each predicted protein-coding region (colour-coded by biological role), RNA have two members, and the largest group has seven members genes,tRNAsandrepeatelementsareindicated.Arrowsrepresentthedirection (http://www.tigr.org/tdb/mdb/mdb.html). oftranscriptionforeachpredictedcodingregion.NumbersnexttothetRNA symbols represent the number of tRNAs at a locus. Numbers next to GES Soluteuptakeandmetabolism representthenumberofmembrane-spanningdomainspredictedbytheGold- Severaltransportersreflectingtheheterotrophicmetabolismofthis man,Engelman,andSteitzscaleascalculatedbyTopPred45forthatprotein.Only speciesarepresent,includingthoseforimportingmaltose(malE), proteins with five or more GES domains are shown. Presumed transporter ribose(rbsB)andspermidineand/or putrescine(potD),aswellas specificityisindicatedabovepredictedcodingregionsidentifiedastransporters. several carbohydrate transporters whose specificity is unknown. Transporterabbreviationsareasfollows:+,cations;H+,protons;K+,potassium;Pi, Carriersfortheuptakeofaminoacidsandoligopeptides,andion- phosphate;Zn,zinc;aa,aminoacids;Na+,sodium;COH,sugar;aaX,oligopep- transportsystemsfortheacquisitionofK+(trkA/H),Mg2+(mgtE), tides; mal, maltose; rib, ribose; s/p, spermidine/putrescine; ura, uracil; ant, NH+4 (amt), PO24- (pstA/C/B/S) and both oxidized and reduced antibiotics;Fe2+,iron(II);Fe3+,iron(III);NH+4,ammonium;bcaa,branchedchain formsofiron(feoA/B)arepresent.Thelargenumberoftransporters aminoacids;g3Pi,glycerol-3-phosphate;glyc,glycerol;chro,chromate;Mg2+, for carbohydrates and amino acids (Figs 3, 4) suggests that the magnesium;questionmarks(?)indicatewheresubstratespecificityisuncertain environment in which T. maritima is found is rich in organic or unknown. Members of paralogous gene families are identified by family material. numberinaboxabovethepredictedcodingregion. © 1999 Macmillan Magazines Ltd 324 NATURE|VOL399|27MAY1999|www.nature.com articles butthispathwayallowsfortheeliminationofgrowthinhibitoryH Motility is regulatedbya two-componenthistidine kinasesignal- 2 which isproducedduring fermentativegrowth.Various flavopro- transductionpathwaythatisassembledfromproductsoftheChe teins and iron-sulphur proteins have been identified as potential genes(cheA/B/C/D/R/W/Y)andsevenmethyl-acceptingchemotac- electroncarriers. tictransducerproteins(MCPs).GenesencodingT.maritimaMCPs are most closely related to homologues in Bacillus subtilis, with Responsetoenvironmentalstimuli specificity for both amino acids and carbohydrates. Although T.maritimademonstratesacarbohydrate-dependentthermotactic T.maritimademonstratesnochemotacticresponsetoserine19,the responsetotemperaturegradientsbetween50and105(cid:56)C(ref.19). presence of amino-acid-specific MCPs indicates that T. maritima mayrespondtoaspartate,threonineandglycine,whichappeartobe catabolizedbythisbacterium. (cid:82) Table2T.maritimaMSB8genelist Inadditiontothechemotactic-responsekinases,othermembers GeneidentificationnumbersthatcorrespondtothoseinFig.2arelistedherewiththe of the two-component signalling family identified in T. maritima prefixTMfollowedbythecommonnameassignedtoeachprotein,thethree-orfour- arelikelytohavearoleinmonitoringandrespondingtoenviron- lettergenenameinparentheses,theorganismwiththemostsignificantmatchin mentalstimulisuchastemperatureandnutrients.Thisgroupofsix bracesandthepercentsimilaritytothebestmatch.Eachgeneidentifiedislistedinits functionalrolecategory(adoptedfromref.10).Incaseswherethesubstrate signalling partners includes the previously identified histidine- specificityofaproteincouldnotbeunambiguouslydetermined,amoregeneral kinase (hpkA)/response-regulator (drrA) pair belonging to the commonnamewasusedandnogenenamewasassigned.Insomecasesagene OmpR–PhoBsubfamilyoftranscriptionalregulators20. withoutknownsubstratespecificitycouldbeconfidentlyassignedtoaparticular familyasfoundinPROSITE(http://expasy.hcuge.ch/sprot/prosite.html/)orSWISS- Althoughnoheat-orcold-shockresponsehasbeenexperimen- PROT(http://expasy.hcuge.ch/sprot/).Theterm‘related’isusedintwowaysinthe tally demonstrated in T. maritima, the genome contains genes commonnames:(1)whentheTMproteinisapartialbutsignificantmatchtoa databaseprotein:theTMproteinisassignedtotheUnknownrolecategory;or(2) encodingheat-shockproteins(dnaJ/K,groEL/ES,grpE,hslU/Vand whentheTMproteinisaverygoodmatchtoadatabaseproteinwhosefunctionis hsp)andcold-shockproteins(encodedbycspB/L),whichprobably notfoundinT.maritima:theTMproteinmaybeassignedtoarolecategory help regulate responses to changes in ambient temperature and appropriatetotheknownfunctionofthedatabasematch.Abbreviationsareas follows:Commonnames:AA,aminoacid;NH3,ammonia;NH4+,ammonium;AFS, growth conditions. T. maritima also contains genes encoding a authenticframeshift;APM,authenticpointmutation;Bprt,bindingprotein;BRAA, generalstressprotein(ctc)andastationary-phase-survivalprotein branchedchainAA;Cl- ,chloride;CoA,coenzymeA;DHase,dehydrogenase;dep, (surE),bothofwhicharepresumablyinvolvedinstresssurvival. dependent;elong,elongation;fam,family;flgr,flagellar;init,initiation;Fe,iron;Fe3+, iron(III);Fe2+,iron(II);LPS,lipopolysaccharide;Mg2+,magnesium;Mn2+, manganese;OP,oligopeptide;PPase,phosphatase;P,phosphate;PPR, Cellularactivities phosphoribosyl;K+,potassium;prt,protein;put,putative;RDase,reductase;reg, Proteinsecretion,competenceandtransformation.Inadditionto regulation,regulator,regulatory;rel,related;ssDNA,singlestrandedDNA;Na+, sodium;S,sulfur;sub,subunit;Sase,synthase,synthetase;term,termination;Tase, the Sec-A-dependent secretory pathway, T. maritima has two transferase;transp,transporter;Zn,zinc.Organismofbestmatch:Ac,Acinetobacter specialized export systems. The first, which is assembled from calcoaceticus;Ab,Agaricusbisporus;Ar,Agrobacteriumradiobacter;At, homologuesofFliH/P/RandFlhA/B21,isprobablyassociatedwith Agrobacteriumtumefaciens;Ae,Alcaligeneseutrophus;Alc,Alcaligenessp.;Alt, Alteromonassp.;Amy,Amycolatasp.;Ana,Anabaenasp.;Ath,Anaerocellum thesecretionandassemblyofthesingleT.maritimaflagellum19.The thermophilum;Aa,Aquifexaeolicus;Atl,Arabidopsisthaliana;Af,Archaeoglobus second is a type-II secretion pathway that is assembled from the fulgidus;Art,Arthrobactersp.;Aca,Azorhizobiumcaulinodans;Av,Azotobacter generalsecretionpathwayproteinsD,EandF(homologuesofthe vinelandii;Bbv,Bacillusbrevis;Bca,Bacilluscaldolyticus;Bce,Bacilluscereus;Bci, Bacilluscirculans;Bf,Bacillusfirmus;Bl,Bacilluslicheniformis;Bm,Bacillus PulDtransportfamilyofKlebsiellaoxytoca22).TheT.maritimatype- megaterium;Bac,Bacillussp.;Bsp,Bacillussphaericus;Bst,Bacillus IIsecretionpathwayprobablyservesastheprimarymechanismfor stearothermophilus;Bs,Bacillussubtilis;T4,BacteriophageT4;Bn,Bacteroides nodosus;Bt,Bacteroidesthetaiotaomicron;Bv,Betavulgaris;Bp,Bordetella secretion of degradative enzymes required for the utilization of pertussis;Bb,Borreliaburgdorferi;Bbr,Brevibacillusbrevis;Bli,Brevibacterium polysaccharides. linens;Ba,Brucellaabortus;Ce,Caenorhabditiselegans;Cj,Campylobacterjejuni; CompetencehasnotbeendemonstratedinT.maritima,butitis Ca,Candidaalbicans;Ch,Caprahircus;Cc,Caulobactercrescentus;Cp,Chlamydia psittaci;Cr,Chlamydomonasreinhardtii;Cau,Chloroflexusaurantiacus;Cac, possiblethatthetype-IIgeneralsecretionpathwayproteinsD,EandF, Clostridiumacetobutylicum;Cla,Clostridiumacidiurici;Chi,Clostridiumhistolyticum; andthetypeIVpilinleaderpeptidase,typeIVpilin-relatedprotein Cpa,Clostridiumpasteurianum;Cpe,Clostridiumperfringens;Clo,Clostridiumsp.; andpilTidentifiedinT.maritimamayfunctioninnaturalcompetence Ct,Clostridiumthermocellum;Cam,Cornyebacteriumammoniagenes;Dd, Desulfovibriodesulfuricans;Df,Desulfovibriofructosovorans;Dg,Desulfovibrio andtransformation,asdescribedforotherorganisms23.Inaddition,T. gigas;Dv,Desulfovibriovulgaris;Dt,Dictyoglomusthermophilum;Eco,Eikenella maritimahashomologuesofthecompetencegenesdprA,comMand corrodens;Ea,Enterobacteraerogenes;Ef,Enterococcusfaecalis;Ech,Erwinia comEofHaemophilusinfluenzae,andcomEAandcomFCofB.subtilis. chrysanthemi;Ec,Escherichiacoli;Eac,Eubacteriumacidaminophilum;Eg,Euglena gracilis;Fi,Fervidobacteriumislandicum;Hi,Haemophilusinfluenzae;Hp, Thereareotherprotein-codingsequenceswithweakhomologyto Helicobacterpylori;Hs,Homosapiens;Kp,Klebsiellapneumoniae;Lf,Lactobacillus competencegenesfromB.subtilis.Thissuggeststhattheremaybe fermentum;Lp,Lactobacilluspentosus;Ll,Lactococcuslactis;Lpn,Legionella aninherentsystemfor theuptakeofexogenous DNA, facilitating pneumophila;Lm,Leptospirameyeri;Lmo,Listeriamonocytogenes;Mc, Mesembryanthemumcrystallinum;Mta,Methanobacteriumthermoautotrophicum; geneticexchangebetweenT.maritimaandotherorganisms. Mtf,Methanobacteriumthermoformicicum;Mj,Methanococcusjannaschii;Mk, Transcriptionandtranslation.Genesencodingthethreesubunits Maeeruthgyilnoobsaac;tMeroiu,mMeicxrtoomrqouneonssp;oMraluo,Mliviacsrotecroocscpuosral;uMtemusu;,MMau,sMmicurosccuyslutiss;Ml, ((cid:97),(cid:98),(cid:98)(cid:57))ofthecoreRNApolymerasewereidentifiedinT.maritima Mycobacteriumleprae;Ms,Mycobacteriumsmegmatis;Mt,Mycobacterium (rpoA/B/C,respectively)alongwithfour(cid:106)factors;(cid:106)A(alsonamed tuberculosis;Mg,Mycoplasmagenitalium;Mp,Mycoplasmapneumoniae;Mx, (cid:106)70)(rpoD),(cid:106)E(rpoE),(cid:106)H(fliA)and(cid:106)28(spoOH).Although(cid:106)Ahas Myxococcusxanthus;Nf,Naegleriafowleri;Ng,Neisseriagonorrhoeae;Nos,Nostoc sp.;Pd,Paracoccusdenitrificans;Pha,Pasteurellahaemolytica;Pan,Podospora beenpreviouslyidentifiedinT.maritima24,therolesandspecificity anserina;Pg,Porphyromonasgingivalis;Pm,Propionigeniummodestum;Pa, oftheremaining(cid:106)factorsintranscriptionregulationareunknown. Pseudomonasaeruginosa;Pc,Pseudomonascichorii;Pf,Pseudomonas ThenusA/B/Gtranscriptionantiterminationgenesandamemberof fluorescens;Pp,Pseudomonasputida;Pse,Pseudomonassp.;Ps,Pseudomonas stutzeri;Psy,Pseudomonassyringae;Pfu,Pyrococcusfuriosus;Ph,Pyrococcus the greA/B transcription-elongation-factor family were identified horikoshii;Pyr,Pyrococcussp.;Rn,Rattusnorvegicus;Ra,Reclinomonasamericana; alongwiththerhotranscription-terminationfactor. Rc,Rhodobactercapsulatus;Sc,Saccharomycescerevisiae;Se, ThegenomeofT.maritimaissimilartoothersequencedbacterial Saccharopolysporaerythraea;Sch,Salmonellacholeraesuis;Sd,Shigella dysenteriae;Sa,Staphylococcusaureus;Sx,Staphylococcusxylosus;Sau, genomesinthatthegeneforglutaminyltRNA-synthetaseismiss- Stigmatellaaurantiaca;Sb,Streptococcusbovis;Sm,Streptococcusmutans;Sp, ing. An alternative synthesis mechanism is the transamidation of Streptococcuspneumoniae;St,Streptococcusthermophilus;Sco,Streptomyces Glu-tRNAGln to Gln-tRNAGln byGlu-tRNAGln amidotransferase, a coelicolor;Ss,Sulfolobussolfataricus;Ssc,Susscrofa;Syn,Synechococcussp.; SPCC,SynechocystisPCC6803;Scy,Synechocystissp.;Tb,Thermoanaerobacter heterotrimeric enzyme found in both Eubacteria and Archaea25. brockii;Tth,Thermoanaerobacteriumthermosaccharolyticum;Tl,Thermococcus Like Helicobacter pylori26, T. maritima lacks an asparaginyl-tRNA litoralis;Tm,Thermotogamaritima;Tn,Thermotoganeapolitana;Ta,Thermus synthetase.TransamidationofAsp-tRNAAsnispresumablyinvolved aquaticus;Tt,Thermusthermophilus;Td,Treponemadenticola;Tp,Treponema pallidum;Va,Vibrioalginolyticus;Vc,Vibriocholerae;Vp,Vibrioparachaemolyticus; in generation of Asn-tRNAAsn, similar to that reported in the .W....s..,...W....o..l..i.n..e...l.l.a....s..u...c..c...i.n...o...g..e...n..e...s..;...X...l.,..X...e...n..o...p...u..s....l.a...e..v...i.s..;..Y...e..,...Y..e...r.s...i.n...i.a...e...n...t.e...r.o...c...o..l.i.t..i.c..a..................... archaeonHaloferaxvolcanii27. © 1999 Macmillan Magazines Ltd NATURE|VOL399|27MAY1999|www.nature.com 325 articles 11 oligopeptide ABC transport systems 1 2 3 4 5 6 7 8 9 10 11 livF/G/H/M branched chain amino acids 1 GLUCOSE GLUCONATE ENTNER amino acids DOUDOROFF a PATHWAY Glycine potA/B/C/D 2 PENTOSE PHOSPHATE ? glucose-6-P 6 phosphogluconate Acetamide spermidine/ PATHWAY Threonine putrescine ? ? ? G KDPG pstA/B/C/S 3 gly-3-P Fructose-6-P LYCO NH3+CO2+H2 phosphate C transport systems 45 cygslyteciinnee ? serine ? PELYSISP ? DHcAhPorismapgtgyelrlyyu-+c3vea-Prtoel-tt3ry-yrPopstoinpehanglycerol 3m3o ftloarg geellanre &s FLApu cGphrhyaEorrcPoLsimLplUhaaMtete ? ? ugar AB 6 ASPARTATE MALATE pPoyrruvate voaxlainloeacetate aspar?tate cheA/B/C/D/R/W/Y Mzngu 2A+/ B? /C 9 s H2 and CO2 Acetyl-CoALACTATE threonine 7 MCPs zinc 7 potassium ? OR histidine arginine? 2-OXOGLUTARATE glutamate proline PRPP iron (III) 8 glutamine ALDEHYDES chemotactic KETOISOVALERATE leucine Ribose-5-P signals iron (III) ? 9 ADP + Pi ATP PPi 2Pi cations cations P-type ATPase rbsrAib/oBs/Ce/D mmaalElt/oFs/eG mmaalElt/oFs/eGgulygcpeAro/Bl-3/E-P gulgyplctpaeFkreol antibiotics ? AHT+P? pHyr+o?- NaHm4t+ pottraksAs/Hium ifreoonA (/IBI) oaN d aA+/B facilitator synthase phosphatase Figure3OverviewofmetabolismandtransportinT.maritimaMSB8.Pathways composite figures of ovals, diamonds and circles. All other transporters are forenergyproductionandthemetabolismoforganiccompounds,acidsand drawnasrectangles.Exportorimportofsolutesisdesignatedbythedirectionof aldehydesareshown.Eachgeneproductwithapredictedfunctioninionor thearrowsthroughthetransporter.Ifprecisesubstratespecificitycouldnotbe solutetransportisillustrated.Transportersaregroupedbysubstratespecificity determinedforatransporter,nogenenamewasassignedandamoregeneral according to role category: cations (green), anions (red), carbohydrates commonnamereflecting the type of substratebeingtransported was used. (yellow),purines(purple),aminoacids/peptides/amines(darkblue)andother Abbreviations:PRPP,phosphoribosyl-pyrophosphate;gly,glyceraldehyde;PEP, (lightblue).Questionmarksassociatedwithtransportersindicateuncertaintiesin phosphoenolpyruvate; ATP, adenosine triphosphate; ADP, adenosine diphos- substratespecificityordirectionoftransport.Questionmarksassociatedwith phate;DHAP,dihydroxyacetonephosphate;OR,oxidoreductases;MCP,methyl- metabolic pathways indicate where an expected activity was not found. acceptingchemotaxisprotein;KDPG,2-keto-3-deoxy-6-phosphogluconate;por, Permeases are represented by ovals. ABC transport systems are shown as pyruvateoxidoreductase. Celldivision.ThegenecontentofT.maritima demonstratesthat Tocapitalizeontheinformationcontainedincompletedgenome thebasicmechanismofcelldivisionissimilartothatfoundinother sequences,andtoreducecomplicationscausedbydifferentspecies Eubacteria. The ftsA/H/Y/Z genes were identified along with two sets in phylogenetic analyses of different genes29, we identified a genesfromtheminlocus,minC/D.Thesegenesfunctiontogetherto subset of 33 genes (see Methods) for which homologues were specifythepositionandformationoftheconstrictingringthatleads conserved in all species sequenced to date4,5,8,11,13,26,30–39 (Table 3). tofinaldivision. Thissubsetwasusedtogeneratemultiplesequencealignmentsand Detoxification. Proteins for detoxification in this strict anaerobe phylogenetictreesusingbothparsimonyanddistancemethods(see includetwoNADHoxidases(nox),aputativealkylhydroperoxide Methods). reductase, a heavy-metal-resistance transcriptional regulator and A few significant patterns arise from the analysis of these theperiplasmicdivalentcation-toleranceprotein(cutA). phylogenies.First,forthemajorityofgenes,theArchaeaconstitute a distinct monophyletic group separate from the Eubacteria, a Phylogeneticsandcomparativegenomics phylogeneticpatternalsofoundinSSUrRNAtrees.However,this TheavailabilityofthecompletegenomesequenceofT.maritimais finding does not relate to whether the Archaea as a whole are important for evolutionary studies, as T. maritima has been sug- monophyletic, as the species of Archaea represented here are all gestedtobeoneofthedeepestbranchingeubacterialspecies,onthe Euryarchaeota. Second, most yeast genes group with the archaeal basisofphylogeneticanalysisofSSUrRNAgenes3.AlthoughSSU genesasfoundforSSUrRNA.Inaddition,inalmostalltrees,species rRNA analysis has been useful in identifying and characterizing that are part of the same bacterial phyla group together (for bacterialstrainsandspecies,manyquestionsremainregardingthe example, Escherichia coli and H. influenzae, Borrelia burgdorferi validity of evolutionary conclusions based on SSU rRNA analysis andTreponemapallidum).Beyondthis,therearesignificantdiffer- (see,forexample,ref.28). ences in the topologies of different genes, and there is little © 1999 Macmillan Magazines Ltd 326 NATURE|VOL399|27MAY1999|www.nature.com articles Amino acids almost half to P. horikoshii. This similarity of T. maritima to the SugarTM0593 Oligopeptides ArchaeacontrastswiththeotherEubacteriainwhichnomorethan STMug0a4r32TM0810 TM1067O TlMig0o0p5e6ptiOdleigsopeptides 16%ofthecodingsequencesinA.aeolicus,and7%ofthecoding Zinc TM0309 sequences in B. subtilis, are most similar to archaeal proteins. By TM0123 Oligopeptides Gly-3-P TM1746 whole-genomesimilaritycomparison,T.maritimaappearstobethe TM1120 Oligopeptides mostArchaea-likeofallsequencedEubacteria4,11,26,31–38. TM1150 STMug0a5r95 Oligopeptides The Archaea-like nature of the T. maritima genome does not TM0531 necessarily reflect a closely shared common ancestor between Sugar TM1855 Oligopeptides T. maritima and the Archaea, as the extensive similarity between TM0300 thesespeciescouldhaveariseninmanyways.Theseincludelossof Maltose TM1204 Oligopeptides these genes in other lineages, extensive sequence divergence in TM0071 Maltose mesophilic Eubacteria (coupled with a retentionof such genes in TM1839 Oligopeptides TM1199 thethermophilicArchaeaandT.maritima)and,alternatively,thata Sugar Oligopeptides few genes in the T. maritima genome with strong similarity to TM0277 TM0031 archaealgeneshaveexpandedintolargegenefamilies,resultingin Sugar Oligopeptides TM0418 TM1223 more genes with a higher level of similarity to archaeal genes. Spermidine/ Oligopeptides Such mechanisms could lead to similarity between Archaea and putrescine TM1226 TM1375 T.maritimaregardlessoftheevolutionaryhistoryofthesespecies. PTMho1s2p6h4ate SugarRTMib0o9s5e8 We believe, however, that much of the similarity between Branched Iron Iron TM0114 T.maritimaandtheArchaeaisduetosharedancestryofportions chain TM0080TM0189 amino acids ofthegenomeasaresultofextensivelateralgenetransferbetween TM1135 these lineages (Tables 3, 4). One line of evidence consistent with Figure 4 Phylogenetic pattern of the periplasmic SBP component of the previousstudieswhichhavearguedinsupportoflateraltransfer40is oligopeptide transporters and other transporters present on the T. maritima that the 451 Archaea-like genes in T. maritima are not uniformly MSB8genome.Sperm/putre,spermidine/putrescine;Gly-3-P,glycerol-3-phos- distributed among the biological role categories (Table 4). The phate;sugar,unsureofspecificityofsugartransporter. majority of genes involved in housekeeping functions such as transcription, translation, DNA replication and cell division are agreementbetweenthesegenetreesandtherRNAtree.Inparticular, mostsimilartoorthologuesineubacterialspecies.Incontrast,49% wefindlittlesupportfortherRNA-basedpositionsofAquifexand of transporters (92), 60% of electrontransport proteins (28) and Thermotoga. 42%ofconservedhypothetical proteins(173)aremostsimilar to Thelackof congruence of treesfordifferent genes could result archaeal genes. Another observation which would support lateral fromthelimitednumberofspeciesrepresentedbythecompleted genetransferisthat81oftheArchaea-likegenesintheT.maritima genomesand/orthesmallsizeofsomeofthesegenes.However,we genomeareclusteredin15regionsofthechromosomethatrangein believethatthedifferencesarerealbothbecauseofhighbootstrap sizefrom(cid:44)4to20kb.Thegenes,andconservationofgeneorderin support and because differences in topology have been found by seven of these regions, have only been described in the genome othersintheanalysisofothergenes28.Mechanismsthatcouldlead sequencesofthermophilicArchaea.Inaddition,twooftheclustered to these differences include gene duplication, gene loss and hori- regionsareassociatedwiththe30-bprepeatelementsfoundamong zontalgenetransfer.Thusweconcludethat,basedonsingle-gene Archaea and T. maritima, lending support to the idea that these analysis,thephylogeneticpositionofAquifexandThermotoga,and repeatelementsmaybeinvolvedingenetransfer. thenatureofthedeepestbranchingeubacterialspecies,shouldbe (cid:120)2-analysis of the genome sequence (see Methods) lends addi- consideredtobeambiguous. tionalsupporttothetheoryoflateralgenetransferinT.maritima. Phylogeneticanalysisofindividualgeneshaslimitedevolution- Based on the assumption that the DNA composition is relatively arystudiestoasmallpercentageofthegenesinanygivengenome. uniformthroughoutthegenome,thereareatleast51regionsinthe Asanalternativetosingle-genephylogeneticanalysis,wecompared chromosome(Fig.1)thathaveasignificantlydifferentcomposition T. maritima’s genome sequence to those of the completely (P(cid:35)1:9(cid:51)10(cid:50)9). Forty-two of these regions include genes and sequenced microbial species by observing patterns of similarity repeatstructuresthathavehighestlevelsofsimilaritytoregionson (see Methods). Of the 1,877 predicted coding sequences in the the chromosomes of other thermophiles, including the thermo- T.maritimagenome,52%aremostsimilartoproteinsineubacterial philicArchaeaandAquifex.Allofthe30-bpsmallrepeatareashavea species,mosttotheGram-positivebacteriumB.subtilis(21%)and (cid:120)2 composition that is substantially differentfrom the rest of the to A. aeolicus (15%). In addition, 24% of the predicted coding sequencesaremostsimilartoproteinsinarchaealspecies(Table4), Table4TopeubacterialorarchaealmatchinT.maritimabyroleID Rolecategory Eubacteria Archaea Eukaryotes None ............................................................................................................................................................................. Table3GenessharedbytheT.maritimaMSB8genome Aminoacidbiosynthesis 49 20 2 1 Purines,pyrimidines,etc. 32 11 2 0 33homologuesinthecompletedgenomes Fattyacidandphospholipidmetab. 12 3 0 0 apt,argS,eno,ftsZ,gcp,gltX,groES,hisT,pheS,pgk,prsrplA,rplB,rplC,rplE,rplF, Biosynthesisofcofactorsetc. 22 10 0 0 rplK,rplN,rplR,rplV,rpsB,rpsC,rpsD,rpsE,rpsG,rpsH,rpsJ,rpsK,rpsL,rpsM,rpsQ, Centralintermediarymetabolism 27 12 1 3 rpS,secY. Autotrophicmetabolism 0 0 0 0 ............................................................................................................................................................................. Energymetabolism 117 56 1 18 T.maritimagenesmatchingonlyhyperthermophiles: Transport 89 92 0 7 108total;93hypotheticalproteins;flgAputative;s-layer-relatedprotein;rubrerythrin DNAmetabolism 45 6 0 2 putative;2ABCtransporters;glutamatesynthase-relatedprotein;glutaredoxin Transcription 17 0 0 0 putative;putativeglyceratekinase;putativehydrogenase;putativeNADH Translation 118 6 0 6 dehydrogenase;putativeLPSbiosynthesisprotein;putativephosphonopyruvate Regulatoryfunctions 61 9 0 0 decarboxylase;putativepyruvateformatelyaseactivatingenzyme;alanine Cellenvelope 57 9 1 7 acetyltransferase-relatedprotein;sensoryboxprotein. Cellularprocesses 55 10 0 0 ............................................................................................................................................................................. T.maritimagenesmatchingonlyArchaea Other 9 8 0 1 71total;64hypotheticalproteins;2ABCtransporters;glutamatesynthase-related Hypotheticals 213 173 2 19 Unknown 52 26 0 5 protein;putativeglyceratekinase;putativehydrogenase;rubrerythrin;sensorybox protein. TOTAL 975 451 9 69 ............................................................................................................................................................................. ............................................................................................................................................................................. © 1999 Macmillan Magazines Ltd NATURE|VOL399|27MAY1999|www.nature.com 327 articles lamAbglB 24 25 26 27 28 29 30 31 32 33 34 feoAfeoB aguA xynA uxuA xynB xloAaxeA 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 xylQ 296297 298 299 300 301 302 303 304 305 306 307 308 309 310 10621063 1064 1065 1066 10671068 10691070107110721073 1148 1149 1150 1151 1152 1153 1154 1155 galK galTgalA malGmalFmalE 1190 11911192 1193 1194 1195 11961197 1198 1199 1200 12011202 12031204 1218 12191220 1221 12221223 1224 1225 1226 1227 17461747174817491750 17511752 1753 Figure5Linearrepresentationofthelocationofnineoligopeptidetransporter transportproteinA;feoB,iron(II)transportproteinB;galA,(cid:97)-galactosidase;galK, operons on the T. maritima MSB8 genome. Arrows represent functional galactokinase;galT,galactose-l-phosphateuridylyltransferase;lamA,laminari- categories as follows: black, sugar metabolism; red, transporters; orange, nase;malE/F/G,maltoseABCtransporter;uxuA,D-mannonatehydrolase;xloA, regulators;blue,hypothetical;yellow,conservedhypothetical;grey,other.aguA, xylosidase;xylQ,(cid:97)-xylosidase;xynA,endo-1,4-(cid:98)-xylanaseA;xynB,endo-1,4-(cid:98)- (cid:97)-glucuronidase;axeA,acetylxylanesterase;bglB,(cid:98)-glucosidase;feoA,iron(II) xylanaseB.Arrowsarenotproportionaltothesizeofpredicted-codingregions. genome. Compositional bias as seen in these (cid:120)2 distributions has Forthecomparativegenomics,theT.maritimaORFswereaddedtoasetof previouslybeenusedinsupportoflateralgenetransfer41. allORFsfrom16publishedmicrobialgenomes4,5,8,11,13,26,30–39.Thisdatasetwas In summary, completion of the sequence of the T. maritima searchedagainstitselfusingBLASTX43,andpair-wisematcheswereidentified, genomehasrevealedadegreeofsimilaritywiththeArchaeainterms clusteredandconvertedtomultiplesequencealignmentsasabove.Fromthis of gene content and overall genome organization that was not setofalignmentsasubsetof33homologousgenefamilies(P(cid:35)10(cid:50)5over60% previously appreciated. Although the core of T. maritima may be ofthequerysearchlength)wereidentified.Thesealignmentswereenhanced eubacterial,almostonequarterofthegenomeisarchaealinnature. using the CLUSTAL X program, and regions of the alignments that were The mosaic nature of T. maritima appears to be the result of ambiguous, hypervariable or that contained large alignment gaps were extensivelateralgenetransferwiththeArchaea.Theaccumulating excluded from subsequent phylogenetic analysis. Phylogenetic trees were evidencefor the high frequencyof lateral transferevents, and the generated from the curated alignments using the PAUP 4.0.0d64 and lack of congruence among the phylogenies of different genes, PHYLIP computer programs. Parsimony analysis was conducted using the indicates that the emphasis of phylogenetic studies on individual heuristicsearchalgorithmofPAUPandprotparsofPHYLIP.Distance-based genesasindicatorsoforganismalevolutionisprobablynotaccurate. treeswere generatedusingtheneighbour-joiningalgorithm of ref.44.One Undoubtedly, our understanding of the complex relationships hundred bootstrap replicates were conducted for all trees. For all PAUP among prokaryotes will continue to increase as other microbial analyses,multiplestepmatriceswereusedincalculationofdistancesandin genomesequencingprojectsarecompleted. (cid:77) parsimonyanalysis.ThewholegenomesetofpairwiseBLASTX43searchresults ......................................................................................................................... wasalsousedtodetermine‘besthits’ofT.maritimaandA.aeolicustoother Methods genomes. Whole-genome random sequencing procedure. The type strain, Forthe(cid:120)2analysiswecomputedthedistributionofall64trinucleotides T. maritima MSB8, was grown from a culture derived from a single cell (3mers)forthecompletegenome,andthencomputedthe3merdistributionin isolated by optical tweezers and provided by R. Huber (University of 2,000bpwindowsacrossthegenome.Weusedwindowsthatoverlappedbyhalf Regensburg).Cloning,sequencingandassemblywereasdescribedforgenomes theirlength;thatis,1,000bp.Foreachwindow,wecomputedthe(cid:120)2statisticon sequenced by TIGR4,5. One small-insert plasmid library (1.5–2.5kb) was thedifferencebetweenits3mercontentandthatofthewholegenome.Alarge generated by random mechanical shearing of genomic DNA. One large- valueofthisstatisticmeansthatthecompositionwithinthewindowisdifferent insertlambdalibrarywasgeneratedbypartialTsp5091digestionandligation fromtherestofthegenome.Figure1illustratesthoseregionsofthegenome to (cid:108)-DASHII/EcoRI vector (Stratagene). In the initial random sequencing with(cid:120)2valueswhoseprobabilityislessthan1:9(cid:51)10(cid:50)9;theprobabilityvalues phase,(cid:44)7-foldsequencecoveragewasachievedwith27,789sequencesfrom fortheseregionsarebasedontheassumptionthattheDNAcompositionis plasmidclones(averagereadlength,531bases).Theplasmidand(cid:108)-sequences relativelyuniformthroughoutthegenome.Becausethisassumptionmaybe werejointlyassembledusingTIGRAssembler42.Sequencesfrombothendsof incorrect,weprefertointerprethigh(cid:120)2valuesmerelyasindicatorsofregions 546(cid:108)-clonesservedasagenomescaffold,verifyingtheorientation,orderand onthechromosomethatappearunusualandthatdemandfurtherscrutiny. integrity of the contigs. Sequence gaps were closed by editing the ends of ORFpredictionandgenefamilyidentification.AninitialsetofORFslikely sequencetracesand/orprimerwalkingonplasmidclones.Physicalgapswere toencodeproteinswasidentifiedbyGLIMMER9,andthoseshorterthan30 closedbydirectsequencingoffgenomicDNA,orcombinatorialPCRfollowed codonswereeliminated.ORFsthatoverlappedwerevisuallyinspected,andin bysequencingofthePCRproduct.Thefinalgenomesequenceisbasedon somecasesremoved. ORFs were searchedagainst a non-redundantprotein 30,140sequences. database as previously described4. Frameshifts and point mutations were ParalogousgenefamilieswereconstructedbysearchingtheORFsagainst detectedandcorrectedwhereappropriateasdescribedpreviously26.Remaining themselvesusingBLASTX43,identifyingpairwisematchesaboveP(cid:35)10(cid:50)5over frameshiftsand point mutations areconsideredto be authentic and corre- 60%ofthequerysearchlength,andsubsequentlyclusteringthesematchesinto spondingregionswereannotatedas‘authenticframeshift’or‘authenticpoint multigene families.Multiple sequencealignmentsfor theseprotein families mutation’respectively.TwosetsofhiddenMarkovmodels(HMMs)wereused weregeneratedusingMSA(G.G.Sutton&T.Bussey,personalcommunica- todetermineORFmembershipinfamiliesandsuperfamilies.Theseincluded tion),anannealingalgorithm,andthealignmentswerescrutinized. 527 HMMs from pfam v2.0, and 199 HMMs from the TIGR orthologue © 1999 Macmillan Magazines Ltd 328 NATURE|VOL399|27MAY1999|www.nature.com articles resource. TopPred45 was used to identify membrane-spanning domains 25.Zhang,J.,Hardham,J.,Barbour,A.&Norris,S.AntigenicvariationinLymediseaseBorreliaeby (MSDs)inproteins. promiscuousrecombinationofVMP-likesequencecassettes.Cell89,275–285(1997). 26.Tomb,J.-F.etal.ThecompletegenomesequenceofthegastricpathogenHelicobacterpylori.Science 388,539–547(1997). Received9November1998;accepted31March1999. 27.Curnow,A.W.,Ibba,M.&Soll,D.tRNA-dependentasparagineformation.Nature382,589–590 1. Huber, R. et al. Thermotoga maritima sp. nov. represents a new genus of unique extremely (1996). thermophiliceubacteriagrowingupto90(cid:56)C.Arch.Microbiol.144,324–333(1986). 28.Brown,J.R.&Doolittle,W.F.Archaeaandtheprokaryote-to-eukaryotetransition.Microbiol.Mol. 2. Huber,R.&Stetter,K.O.inTheProkaryotes(edsBalows,A.etal.)3809–3815(Springer,Berlin, Biol.Rev.61,456–502(1997). Heidelberg,NewYork,1992). 29.Eisen,J.A.TheRecAproteinasamodelmoleculeformolecularsystematicstudiesofbacteria: 3. Achenbach-Richter,L.,Gupta,R.,Stetter,K.O.&Woese,C.R.Weretheoriginaleubacteria comparisonoftreesofRecAsand16SrRNAsfromthesamespecies.J.Mol.Evol.41,1105–1123 thermophiles?Syst.Appl.Microbiol.9,34–39(1987). (1995). 4. Fleischmann,R.D.etal.Whole-genomerandomsequenceofHaemophilusinfluenzaeRd.Science269, 30.Smith,D.R.etal.CompletegenomesequenceofMethanobacteriumthermoautotrophicum(cid:68)H: 496–512(1995). Functionalanalysisandcomparativegenomics.J.Bacteriol.179,7135–7155(1997). 5. Klenk,H.-P.etal.Thecompletegenomesequenceofthehyperthermophilic,sulphate-reducing 31.Kunst,F.etal.Thecompletegenomesequenceofthegram-positivebacteriumBacillussubtilis.Nature archaeonArchaeoglobusfulgidus.Nature390,364–370(1997). 390,249–256(1997). 6. Lobry,J.R.AsymmetricsubstitutionpatternsinthetwoDNAstrandsofbacteria.Mol.Biol.Evol.13, 32.Fraser,C.M.etal.GenomicsequenceofaLymediseasespirochete,Borreliaburgdorferi.Nature390, 660–665(1996). 580–586(1997). 7. Salzberg,S.,Salzberg,A.,Kerlavage,A.&Tomb,J.-F.Skewedoligomersandoriginsofreplication. 33.Blattner,F.R.etal.ThecompletegenomesequenceofEscherichiacoliK-12.Science277,1453–1462 Gene217,57–67(1998). (1997). 8. Bult,C.J.etal.CompletegenomesequenceofthemethanogenicarchaeonMethanococcusjannaschii. 34.Cole,S.T.etal.DecipheringthebiologyofMycobacteriumtuberculosisfromthecompletegenome Science273,1058–1073(1996). sequence.Nature11,537–544(1998). 9. Salzberg,S.L.,Delcher,A.L.,Kasif,S.&White,O.Microbialgeneidentificationusinginterpolated 35.Fraser,C.M.etal.TheminimalgenecomplementofMycoplasmagenitalium.Science270,397–403 Markovmodels.NucleicAcidsRes.15,544–548(1998). (1995). 10.Riley,M.FunctionsofgeneproductsofEscherichiacoli.Microbiol.Rev.57,862–952(1993). 36.Himmelreich,R.etal.CompletesequenceanalysisofthegenomeofthebacteriumMycoplasma 11.Deckert,G.etal.ThecompletegenomeofthehyperthermophilicbacteriumAquifexaeolicus.Nature pneumoniae.NucleicAcidsRes.24,4420–4449(1996). 392,353–358(1998). 37.Kaneko,T.etal.SequenceanalysisofthegenomeoftheunicellularcyanobacteriumSynechocystissp. 12.Saier,M.H.JrComputer-aidedanalysesoftransportproteinsequences:gleaningevidenceconcerning strainPCC6803.II.Sequencedeterminationoftheentiregenomeandassignmentofpotential function,structure,biogenesis,andevolution.Microbiol.Rev.58,71–93(1994). protein-codingregions.DNARes.(suppl.)3,185–209(1996). 13.Kawarabayasi,Y.etal.Completesequenceandgeneorganizationofthegenomeofahyperthermo- 38.Fraser,C.M.etal.CompletegenomesequenceofTreponemapallidum,thesyphilisspirochete.Science philicarchaebacterium,PyrococcushorikoshiiOT3.DNARes.5,55–76(1998). 281,375–388(1998). 14.Boos,W.&Lucht,J.M.inEscherichiacoliandSalmonellaCellularandMolecularBiology(eds 39.Goffeau,A.etal.Lifewith6000genes.Science274,563–567(1996). Neidhardt,F.C.etal.)1175–1209(ASM,Washington,1996). 40.Huang,Y.-P.&Ito,J.ThehyperthermophilicbacteriumThermotogamaritimahastwodifferentclasses 15.Bronnenmeier,K.,Kern,A.,Liebl,W.&Staudenbauer,W.L.PurificationofThermotogamaritima offamilyCDNApolymerases:evolutionaryimplications.NucleicAcidsRes.26,5300–5309(1998). enzymesforthedegradationofcellulosicmaterials.Appl.Environ.Microbiol.61,1399–1407(1995). 41.Lawrence,J.G.&Ochman,H.Ameliorationofbacterialgenomes:ratesofchangeandexchange.J. 16.Felix,C.R.&Ljungdahl,L.O.Thecellulosome;theexocellularorganelleofClostridium.Annu.Rev. Mol.Evol.44,383–397(1997). Microbiol.47,791–819(1993). 42.Sutton,G.G.,White,O.,Adams,M.D.&Kerlavage,A.R.TIGRAssembler:Anewtoolforassembling 17.Vargas,M.,Kashefi,K.,Blunt-Harris,E.L.&Lovley,D.R.MicrobiologicalevidenceforFe(III) largeshotgunsequencingprojects.GenomeSeq.Technol.1,9–19(1995). reductiononearlyEarth.Nature395,65–67(1998). 43.Altschul,S.F.,Gish,W.,Miller,W.,Myers,E.W.&Lipman,D.J.Basiclocalalignmentsearchtool.J. 18.Janssen,P.H.&Morgan,H.W.HeterotrophicsulfurreductionbyThermotogasp.strainFjSS3.B1. Mol.Biol.215,403–410(1990). FEMSMicrobiol.Lett.75,213–217(1992). 44.Saitou,N.&Nei,M.Theneighbor-joiningmethod:Anewmethodforreconstructingphylogenetic 19.Gluch,M.F.,Typke,D.&Baumeister,W.MotilityandthermotacticresponsesofThermotoga trees.Mol.Biol.Evol.4,406–425(1987). maritima.J.Bacteriol.177,5473–5479(1995). 45.Claros,M.G.&vonHeijne,G.TopPredII:animprovedsoftwareformembraneproteinstructure 20.Lee,P.J.&Stock,A.M.Characterizationofthegenesandproteinsofatwo-componentsystemfrom predictions.Comput.Appl.Biosci.10,685–686(1994). thehyperthermophilicbacteriumThermotogamaritima.J.Bacteriol.178,5579–5585(1996). Acknowledgements.ThisworkwassupportedbytheUSDepartmentofEnergy,OfficeofBiologicaland 21.Macnab,R.M.inEscherichiacoliandSalmonellaCellularandMolecularBiology(edsNeidhardt,F.C. EnvironmentalResearch.WethankM.Heaney,J.Scott,D.MaasandB.Vincentforsoftwareanddatabase etal.)123–145(ASM,Washington,1996). support;R.Roberts,F.Kunst,andM.Simonforusefuldiscussions;andR.Huberforproviding 22.Hueck,C.J.TypeIIIproteinsecretionsystemsinbacterialpathogensofanimalsandplants.Microbiol. T.maritimaMSB8cells. Mol.Biol.Rev.62,379–433(1998). 23.Dubnau,D.BindingandtransportoftransformingDNAbyBacillussubtilis:theroleoftypeIVpilin- CorrespondenceandrequestsformaterialsshouldbeaddressedtoC.M.F.(e-mail:[email protected]).The likeproteins—areview.Gene11,191–198(1997). annotatedgenomesequenceandthegenefamilyalignmentsareavailableontheWorld-WideWebat 24.Gruber,T.M.&Bryant,D.A.Molecularsystematicstudiesofeubacteria,usingsigma70-typesigma http://www.tigr.org/tdb/mdb/.ThesequencehasbeendepositedinGenBankwithaccessionnumber factorsofgroup1andgroup2.J.Bacteriol.179,1734–1747(1997). AE000512. © 1999 Macmillan Magazines Ltd NATURE|VOL399|27MAY1999|www.nature.com 329 Amino acid biosynthesisBiosynthesis of cofactors, prosthetic groups, carriersCell envelopeCellular processesCentral intermediary metabolism Energy metabolismFatty acid/Phospholipid metabolismPurines, pyrimidines, nucleosides and nucleotidesRegulatory functionsDNA Metabolism Transport/binding proteinsTranslationTranscriptionOther categoriesConserved hypotheticalUnknown Signal peptideLipoproteinLPFe+++Transporter9GES regionParalogous gene family94Authentic Frame ShiftRepeat RegionArchaeal Islands 23S23s rRNA16S16s rRNA5S5s rRNA 4tRNA 1 kb RPT5ALR1aaXaaXaaXRPT6BaaXaaXFe2+aaXaaXaaXaaXaaXFe2+aaX??aaX6766667205520101152161664485061164849672120211654212021154155112515316154421646464520 201168435525224142663967692136381213105517029616585327331357127306564188373443444557146049725424409473150525623636259282617163219481546RPT7ARPT8AaaXaaXCOHCOHZnCOHCOHZnCOHZnaaXFe3+?Fe3+Fe3+COHNa+aaX9891289811667815851602551624601626146444415346133115157516315912071584046161156671642056671444444520445 721199611777981119788129142141136102121811001081228210690941287499929311875808312676731331131391351371251321308779858986849114013110495124105101781091101431121271201381141161341231151031Fe3+?TMrrnaA5STMrrnaA16STMrrnaA23S2107Fe3+?Fe3+????H+412169775354416646982598168170108174610816916729157683410447516551721711612271 143201155203209194157147156204165172205188160193198195177164149197199175183146176207169170163171196182154144168191202208166161148153162185151200179173158174159206180152189181190150184167187145186192178178COHCOHPi?COH?11769887666103424217410217910216418023671771251616173175173111261747316617618116172516616614416712816660178 259282237222236274258228232262248245211233281226227224268257247231234239276225266273260221250264240220243263270212217271223219216238269242235213229230218252256265246210267275255244280214215284241249279283261277254272251209253LR8278aaXaaXaaXaaXaaX?aaX?K+??+???1611666611130183601605631869924508750272014052121187155131341845185161172155859518220284308317288310313289306343285298352292290291349307322345350336353330335333287337347324323314303305293286296334297338318316302328311332355321312351294304301329300340320342299309331325354327348346344326295339LR7341315319RPT6ACOH?COHCOHCOHNH4++??1666106861212LR6100425189187643318718941464216416271187615614248188167161149161621701757416801613216185493313046101 379412422384371364395372365386402359367389398414392378417366383416369426397394388385380409373374413358387377403406428421357411424362396429355393370376381415391368390361425375410363404407423400360405356420419408401418382399427RPT4A3COH?COHaaX???COH158676192163422421506718519319141193113221904619217419242156532171105972742DAKOTA429474438482485444483459435472461453462466496458464434445493470481436486442480473471463449433465460441451494487477450489448437490456469491488468457443479476431440430467446455439452475454484495478432447492aaX?aaXaaXaaXaaXaaXaaXaaX+aaX666712702516512367141164311645201095201835194426920117533121985612412412634196181965195170422110418420516548538549563558547553559556554513534570508515555544527557519524509552511541564561505526545551521529518568522569497528531499512517502530537507533550520540506510498562560504536501523542543500546514539566503535532525565567496aaCOH?COHCOHaaaa1961410976136420264202642004541744242199643419455187123864197642198526430519246201 593600579580573576591582608574571624594616592587611632601618625626584629606620631609641578589636640628617621586577575583602597613619622634623572605633607585615581627639590630638596599610598588603595570614612635637604?9112062041291642061424519720752092031412054220842203205938634461681262515129 702704723691645683643662724695700709671673686705703680681660663665667699664687652677675655647644698685642670641697648693719679661714649706653656672688674646712692684708676713682654678651690711650669722666726718725721710668694715717689720696657716701659707658RPT3CrnpB??11166129346452767419031311622091620916951321591664184621018115815220122264102141150 785771730747791728759799803802792729765752800770763793796772756780801779735746731744748790755797750767749783736798741769762727738784739794737778761782758773760745754732757768787786781743742734764788789795766740751733776775753777726774COH?COHuraCOH?11112681169139424568741536714834834220841199468016718061200620780210513845195911461629816110217617783 872820824831814827803873836833807841825865864809874804835847866840858830848851862832877849871852844875816843823838859856811845850837813876868826805854817818822839829846815806834819861808828867853869860842870821810855812857863RPT6CribribribPi?Na+797971176444234179275716145199171065517418014548602044566188186241241821645135129109121391116161 903882912893939916925922924902904914894926920885938896888928908895889910946879934877892929909927919930890921949880935883899901887942945906943918881891931913944937936907932905948952900923951950884955941878953917947933940956911915897886954RPT3ARPT7BRPT8B898?rib??rib?1112116998661981604131914514421344225515146251557 956102910071028101798310141004100210189881019968990100010129979869679649919699729821024101610239791011101598497510059939769639621027101097798998710311022992102096696199899597410089781013981973103098595910031021100696595810321026100996095797110011025994980RPT4B996999970?K+??aaXaaXaaXaaXaaXK+K+13611666106611131195153219538371291422271747302324521292014131434261631253581018261536332820 11051088109610531094105211001093107810381076106010891034104210841087109510621112105411101036106810751061104510401083110410411049104310511063109010701058110110921097106610391067108010721098107310651069104811031091107410461033105510371099108611061085106410501082110210441077103510591057107110471079103211111109110710811056g3Pibcaabcaabcaabcaag3PiaaXaaXaaXaaXaaXg3Pibcaa?Mg2+?666696784342443942464045463415146492021424252055391641394613484452474211462545550 1182113911581170113111691161113511241178116811811164113811621129113711771165115511141121116311361144111811671112113311171140114111421145117311561122112511591149112711791147112611131119115011321154114811301120115711431146118011521153112311721128118411161175111511341160115111711183116611741176COH11COH?COHaaXaaX?malaaXaaXaaXaaXaaXaaXmalaaXaaXmalaaX666131016686688642564416560593619573560683669634661202142215205205425582155205338375662164166675442JAMAICA124012381239121512161187120112311213121412371192124712121205119512071210119112111217119312351229118512301188119412251199120312261224119012221220124112211197120912191236119611981228124312341202122711891204122312461245120012481218123312321249118412421208120611861244RPT1BPi??COHPiPiPi?????611666642278582165283758011835573704614271165828457976168177257833811183558184225747383533337253342Barb131812691277132812811327132113021317125312561319131312861310129512551257127813241280130112541294127313251283127012901288132613031265132213041282129312791311129212641258127212871329131512681275128413091266127413051252125113141299128512621260129712911323130712631259125012711267126112961276124912891300131612981306RPT1A132013081312s/pant?s/ps/p?s/pant??2112117106867428671968990545934652787419288947691984585293190643481339994595974542 13841400139013691371140113961357136813911404138213981383138513581349139414031393136613801409136413401339139213881356136313511350136213421406133513331372134513541370135314081375140213371330138713611336135213471332133413781359140713811373140513951348138913741376139913791377139713601346134413551338134113651343133113861367?glyc19776710064671088864104617751111031628106421091616112751051051694244611067671610216107101 1441145514721465142514261447143214241446146814141452144914511466144214371479142714691459146414171409144414341418146714211430141614201496150014401443148914871423143114281480141914131457145014391429143814481411148314601473143614101481146114851493141514331422147114531499150114451494146314701491149715021475147714781454148814351498148614921495147414581456149014821484LR4146214761412197786117123931191199266671141131247412012212510211616115931612111834 155015361584156415281537155715231519156315701511158215091513151815581542153015241567150815201576150615211525154715401522151615541532157315511581153515451543158515831577157515261510151515531538156115331562153115441517154815491504155915411539151415121578155615651529150715801505150315721574156915521571154615791560156815021555156615271534LR3??321076126110130164616344546712511736712767131102111421091946111128981325129KINGSTON16181619163216381633161216501629163116581624158916431609161016461616159816361607164816421595162016401608161415881593161116011660158716351594163916271637162116061585165616591628159616521625159715911641163015991647160016221617163416451651162316031626166216041653165416551657160216611644160516151613159015921586RPT3D1649?aaXaaX268106676101361362042133139134787142714064461425111721278821107522464613731965042138951411131351729170917321730173117141686172416921701171917251726167416971708170316831733169317411717171516981685168716631721166416791744172817071735172316951710169916721673168216781671171217401691172017221727170517381702174216881669174516771690174817471739166717461675166516961713170017061704167617161694168116621737171816661734174317361668LR216801670171116841689RPT2B1aaXchro?chro?1aaXaaX16810201146411879108143461451442911743551481481649146115481082514421471453214316147 181817651777179417921772178117591770182017601819178017851787175617961778175417841797181617821791177117861768175317521761180617881811181417511799176617731776178318221793174917891779182318121774182117691804180818091750180318001801179817751810180717551805176718021817179518131763176217481758179018151764RPT3BCOHCOH?malmalmalCOH1757666421660894661149132150441421324146424211613214862151261376534224242 18571834184618251867185918621868186118511874187818401848186318261847186918761871182418601852183818501829185518641858187718661856183518441837187318361875184118421839184918311865187018531833184318541832184518281827187218231830 LR1 RPT5A 101 64 120 16 16 16 152 42 48 16 48 49 50 RPT6B42 153 5 aaX 5 aaX 720aaX20 6aaX 21 aaX 61 16 46 112 5 ? 6 46 4 16 Fe2+ 7Fe2+67 21 aaX? 5 aaX 5 aaX20 aaX 6 2a0aX 6 154 155 154 aaX 21 6aaX a2a0X126a2a0X 6 3a5a4X aa5X 5153 6 156 57Fe3+ F4e43+8 91957 Fe3+? 10 158 11 12 13 14 15 16 1217 18 19 20 21 40 22 23 24 12205 2615927 67 28 29 30 318 32 3C3OH 5 34 9C4O4H 358COH4436 37 46 38 16309 4066 41 6412 16143 9CO44H44 4546COH4670 C4O8H 459 50 24 51 52 53 54 55 162 56 115Zn 57Zn5 588Z4n4 4659 71 60 Na+ 133 61 117 4 61258 63163 1664 RPT476A5RPT86A678 67 16684 69 70 25 71 4762 72 73 74 75 76 77 78 79 80 81 82 83 8485 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 46 531 98 68 165 122 7 166 29 104 167 16H+ 168 108 107 TMrrnaA16S 2 TMrrnaA23S TMrrnaA5S 34 25 169 46 71 1 Fe3+?157 9Fe434+? 5 Fe3+1?7 4 7 5 ? 2 98 170 16 ?171 ?5 108 47 172 143 144 145 146 147 148 149 150151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169170171 172 173 174 175 176 177 178178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 Amino acid biosynthesis 172 173 173 111 128 ? 5 174 164 6 16 16 8175 26 166 176 8 166 177 9 166 166 102 102 73 16 7 6178 6 103 167 179 11 Pi?180 181 174 67 125 144 60 COH 7C4O2H 6C4O2H 23 Biosynthesis of cofactors, prosthetic groups, carriers 209 22410 211 161212 182213 6? 5 214 ? 215 5216 127127 218 182319 182420 221 222 223 221455 22550 226185227 60228 aaX? 22291 230 623a1a2X0 6a22a30X2 aaX5233aaX5 234 235 236 237 186 238 239 58 240 aa2X41 21242 243 24459 245 246 182747 2K4+8?1249 250 2512652+251330254255256 22577 258? 259 11260 26150 63262 99 2163 264 140265 266 267 131 268 269 27034 271 12172?87 273 274 160 275 276 277 278 6 LR2789 280 5 ? 281 282 283 284 CCeellll uelanrv eplroopceesses Central intermediary metabolism 284164 285 286 128887 288 289132 290 291189 292 102973 294170 295 296 28907 2+9?8 12 299 74 300 3016 LR1730012 303 304 16 133005 16 138096 162 307 308 162 33 630933 6 5 ? 310100 311LR6 312 64 313 31641315316 31716 16318 31931260321 322 3231 46324 71 325 326 123N2H74+ 328149 32499 48330 33146 332 333 334 61335 185336 337187 338339340341 6342 COH?41343 63C444O2HCOH43245COH5346 347 161 348 1753498350187 187 351 352 353 354 RPT4623A55 355 356 357 358 359 360361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 Energy metabolism C4O2H 6C4O2H 7 41COH? RP2T24A 156 185 190 50 24 32 67 191 5 21 aaX? 192 174 46 97 6 10 163 1 42 193 5 8 3 5 ? 171 ? 27 193 113 192 192 Fatty acid/Phospholipid metabolism 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 DA4K6O7TA 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486487 488 489 490 491 492 493 494 495 496 Purines, pyrimidines, nucleosides and nucleotides 5 aaX 5 aaX 5 aaX 20 aaX20 aaX 6 141 123 194 170 31 34 126 98 67 69 aaX5 aaX 21 6a2a0X 6aaX20 109 42 195 + 164 33 7 5 ? 56 164 117 104 196 196 183 184 70 25 12 18 165 121 42 124124 Regulatory functions 516 DNA Metabolism 497 498 499 500 501 502 503 504505 506 507 508 509 510 511 512 513 514 515 517518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 496 197 6 192 198 1 34 30 7 187 2 9 45 74 199 5 aa aa 46 aa 200 13 41COH? 9COH42 COH42 194 6201 8 202 55 202 64 64 14 64 64 64 64 10 123 64 52 Transport/binding proteins 570 571 126572 57432 57425 575 576 57374 578 579 86 580 581 582 168 583 584 165485 586 581797 203588203 589 590 591 592 59393 594 595 596 597129 598 599 204600 164011 602 603604605606 607 608 260059 610 205611612613614 615 616 46617206 618 561?9 620 11621 622 622307 62442 625 626 627 45 628 629 630 206631 29 632 613432634635636637 638 151 9 632908 624009641 TTrraannsslcartipiotnion Other categories 642109 642 209 643 92 rnpB 644 645 646 647648649 650181651 652 653 654 655 656 46657 65865931660 616519 662 6663 64664 665 666 667 668 669 670 31671 1 67222 62773 16 674 152675 676 677 66478 679 680 681 682 683156884 685102 68664 687 1168188 689 66905691?190692693 136294 695210696 691769698 69970073041 701241 703 70467 705 RPT43C706 150 707 176028 709 710 711 712 71136 714 715 712601 ? 5 71717218 719 720 721 722 723 724 725 726 UCnoknnsoewrvned hypothetical 726 7279177 71238 729 731062731 73621 733 157334 COH41735 67C34O62H CO7H432?7 68738 739 714308 9 207741 12742 87043 7441 74571416 ura724070 161748 7149 750 751 7652 83753 75483 7?555 45 75634 757 75874 759 760 91 761 762 7638177664 765 766208 76746 768 76945 77607 210771 174762 773774 775 1 77671 777 778 779 780781 782 783784 78580786 768780 718082 789 790 719418 792 793 16794 791599 796 797 798 79998 800 801 802 195 803 Signal peptide 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 48 16 Na+ 9 11 34 199 45 7 57 17 109 129 16 124 7 135 6 106 139 188 179 9Pi?180 4R2PT6C 161 204 66 16 1112 45 182 27 174 7 60 186 24 145 145 r4ib4 7 rib5 5 rib Fe+L+P+ TLripaonpsrpootretienr 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 899 900901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 60 rib rib 155 191 25 198 4 646 42 42 51 951 898 134 1151 RP4T7BRPT8B 1 8 1 2 9 3 ? 12 RP4T3A ? 5 6? 6 11 ?7 9 GES region 94 Paralogous gene family 9561195?77 958 8 959 960 9 9619 96210 963 11 964 965 966 9647 96896997097112 1937213 973974 14975 1947697797184 979980981 982 9?835 15 K+?984 16 17 19885 11986 91897 5 aaX 5 aaX29088aaX 6 a2a0X 6989 21 a9a9X0 RP2T24B 991 23 92942 993994995996 99275 9982969910001001 1002 217003 281004 1005 1006 1007 1008 120909 1010 1011 6130012K+15 101103K+ 311014 1015 1016 1017 1018 101932 1020 10216 10226313023 1022641025 130426 35102713361028 1029 13073010316381032 RAuetpheeantt iRc eFgriaomne Shift 1032103339 130934 391035 1036 1037 1038401039 1040 1041 104g23Pi140143 6104g43Pi42 6g3P41i2045 1046 1047 1048 1049 1050 1051 1054231053 101514 10551056 1057 10586 1059 bca1a060 810b6c1aa44 7bc4a14a062bca5a bca15a06345 10664 1065 421066 140667 140268 106946 1070 1071 2010a7a2X 6 107321 aaX 5 aa1X074 5 aaX 20 aaX1075 1076 1077 1078 1079 10802150811082108313 1084 45 Mg2+? 108457 416086 1087 481088 491089 34 1090 1091 1092 1093 50 109446 ? 1095 1096 10979 11609581109952 1100 1101 1102 1103 1104 114025 11061107 1109 11110 1111461112 Archaeal Islands 1112 141613 111453115145 8 111106 11117 551118 1119 56 115270 112158 1122 1123 19 11241125 15126aaX 1127591128 1129 511a3a20X0 aaX 1162301 aaX 61132 211133 aaX 113640 1135 1136 113742 11m38al 16139 141240mal11418 114421 mal 1143 114347 611348535 13 36 1146 16 361147 81148 1149 1150 161151 115260 11535 aaX11545 aaX112505 aaX2011a5a6X 11657 115281 aaX 115961 116062 1161 21 1a1a6X2? 1163 116643 1165 116646 1167 65 1168 5 CO1H16C94O2H? 6C14O127H0 6 141171 117266 1173 117141715176 1177 161778 1179118011181 1182 68 116893 1184 JAMAICA 1184 70 11851186 1187 1188 25 52 1189 11 ? 11590 1191 1192 71 46 712193 5 Pi P4i2 6114P92i4 6 P1i195 1196 171397 1198 73 119191 1200 1201 1202COH5 120374 1204 4212051206120712081209 1210 122121 121275 716213 12141215 1216 121777 78 121186 79 121196 121260 122721 11122280 1223 1224 1 128215 82 122683 ? 5 1227633 633122833 1229 ? 12350 841231 81 123282 123383 1234? 15235 1236? 12375 182438 8R11P2T319B 124805 124112428132431244124583 1246633 ?5 12?47 51248 1249 23S 23s rRNA 182649 1250 R81PT11A251 815252 123543 1254 1255 1012?5867 1257 761258 1B2a5rb9 12602 1261 126828 1263 1264 126545 1266 1267 152468 17269 127089 1271 90127245 901273 1274 1275 1276 71 1277 46 122798 1279 192180 912281 913282 12813?51284 192485 94 1286 95 1287 1288 12819290 11s2/9p141 129s2/p51293 64s2/p 172944s/2p 1295 1296 1297 129812991300 1301 1302 193603 271304 123015390761307183103809 193810 131113113213 1314 13115316 1317 31 1318 1319 13210321 132299 132a3nt?5 1632a4nt?33 641325 1326 1327 132811 1329 16S 5S 51s6 sr RrRNNAA 13301331 1332 641331300 101 1334 13315 94 5133?6 1021337 13381339 13401613411613421314033 1344 161345 16 1346 116347 1014348 134429 1350 7 gl1y3c51 24 13521315035 1354 13551356 113657 1358 105 1359 1360 1361106 107 1362 28 1363 136413651366138687 1368 671369 75 1370 137164 1372 13731081374 1375 109 1376 1377 1378 1379 1380 1387171382 1383 1384 1385 138613871388671389 7461390 110 139161 13912111393 1394 1395 1396 112 1397 139891399 1400 14011402 1403 1404 1405 1406 1407 1408 61740697 4 tRNA 140967 1410 141114114213 1131414 7 1415 1416 1417 1418114 1419 1420 1421115 1422 14231161424 1171425 102 1426 1427118 1428LR4 1421919 1141390 1201413211 8 1432 7 1433 143414351436161437 16 1438122 14639 134440 1441 14421443 144493 913445 74 14146 1447 1448 1449 1450 123 1451 9 1452 1453616454145514516457145814591460 1461 14612463 1464 1465 1466 1467 1241468 1469 1470 1471 1472 147314741471547614771478 1479 1480 148114182251483148414851486148714881489149014911942921493 1494 14951496 1497 14981499 15001501 1502 1 kb 1502 150346 21504150515061261507 15083 1509 151016 1511 1512 46 1513 1514151545 1516127 1517 1105918 1519 1520 341521 1522 11052?33 117 1524 1525 15261115027 1528 1512191 1530 1531 111 1532 153132185341535 16536 1291537 1538 1539 1540 1310541 1542 154367 1544 1545 19 1546 1547 1021548 1549 1550 155198 1552 1553 15541555 1556 1557421558 1559 1560?51561131 1562 1615631564 1565 15661567 1L5R6831569 15701571 1572 1573 1574 132 1575 157667 1577 1578 25 157719 145680 17581 1582 1583 1584 1585 KINGSTON 15851585615?8727158871 133 1589 13415901592111355921593 1594 RP4T3D 461595 1316596136 159746 15981591937160808 1601 1602 1603 1604 160595 1606 1607 1608 160964 16106 169161 161112 1613 14661416151615621617101163188 1619 6162016123191622 1160273 1624 113 16251626 1627 1628 162978 1630 1631 1632 1633 42 1634 1635 1160362 140 1637501638 163931 1640 1641 146242 141 1643 1644 167425 16486 1647 16481649 1650 1651 127 71652 1653 1654 1 16551421656 1657 16a5a8X 162519 1660 1666a1a2X10662 6 166261a6a26X03 16564aa1X6655 a1a6X66143 1616473 R11P06T8628B166196174041671 164732 1673 144 1674 167516 167468 167749 1678 1167916181006881 1682 145168316841685 1686114 1687 168816891690 1691 1692 1619435 1694 1695 1696 641697 1698 8 1699 115 1700 17011 1702 1703 32 1118704 1705117 17061707 17082 110709 1 147610 171117121713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 LR12729146 1730 17311732 1733 1734 1735 1736 1737 1738 171369 719740 1741 172492 1741347 c1h4r1o77?44chro?1745 25 1746 148 17417481748 1147848 1749 1750 1751149 175246175R3PT43B1754150 174565 175262 1757 17581312759 1760 42 mal 16761 42 ma4l2 176241 mal 176317614312765 1766 1767 1768 1769 1770 1771 1173722 1773 1774 61 1775 1776 1777 17781779 1780 1781 65 1782 178362 1C748O24H 6 C147O28H5 6178641178C7OH? 601788 1789174920 1791 1792 1793 341794 1795 1796 1797 1798 1799 1800 1801 16 1801251 116 137 1808391804 180526 1806 1807 1808 1809 1810 1811 1812 18131814 18151816 1817 1818 1819 1820 1821 1822 1823 18231824 1825 1826 1827 1828 18219830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 18631864 1865 1866 1867 1868 1869 1870 18711872 1873 1874 1875 1876 1877 1878 LR1 RPT5A 101 64 120 16 16 16 152 42 48 16 48 49 50 RPT6B42 153 5 aaX 5 aaX 720aaX20 6aaX 21 aaX 61 16 46 112 5 ? 6 46 4 16 Fe2+ 7Fe2+67 21 aaX? 5 aaX 5 aaX20 aaX 6 2a0aX 6 154 155 154 aaX 21 6aaX a2a0X126a2a0X 6 3a5a4X aa5X 5153 6 156 57Fe3+ F4e43+8 91957 Fe3+? 10 158 11 12 13 14 15 16 1217 18 19 20 21 40 22 23 24 12205 2615927 67 28 29 30 318 32 3C3OH 5 34 9C4O4H 358COH4436 37 46 38 16309 4066 41 6412 16143 9CO44H44 4546COH4670 C4O8H 459 50 24 51 52 53 54 55 162 56 115Zn 57Zn5 588Z4n4 4659 71 60 Na+ 133 61 117 4 61258 63163 1664 RPT476A5RPT86A678 67 16684 69 70 25 71 4762 72 73 74 75 76 77 78 79 80 81 82 83 8485 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 46 531 98 68 165 122 7 166 29 104 167 16H+ 168 108 107 TMrrnaA16S 2 TMrrnaA23S TMrrnaA5S 34 25 169 46 71 1 Fe3+?157 9Fe434+? 5 Fe3+1?7 4 7 5 ? 2 98 170 16 ?171 ?5 108 47 172 143 144 145 146 147 148 149 150151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169170171 172 173 174 175 176 177 178178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 Amino acid biosynthesis 172 173 173 111 128 ? 5 174 164 6 16 16 8175 26 166 176 8 166 177 9 166 166 102 102 73 16 7 6178 6 103 167 179 11 Pi?180 181 174 67 125 144 60 COH 7C4O2H 6C4O2H 23 Biosynthesis of cofactors, prosthetic groups, carriers 209 22410 211 161212 182213 6? 5 214 ? 215 5216 127127 218 182319 182420 221 222 223 221455 22550 226185227 60228 aaX? 22291 230 623a1a2X0 6a22a30X2 aaX5233aaX5 234 235 236 237 186 238 239 58 240 aa2X41 21242 243 24459 245 246 182747 2K4+8?1249 250 2512652+251330254255256 22577 258? 259 11260 26150 63262 99 2163 264 140265 266 267 131 268 269 27034 271 12172?87 273 274 160 275 276 277 278 6 LR2789 280 5 ? 281 282 283 284 CCeellll uelanrv eplroopceesses Central intermediary metabolism 284164 285 286 128887 288 289132 290 291189 292 102973 294170 295 296 28907 2+9?8 12 299 74 300 3016 LR1730012 303 304 16 133005 16 138096 162 307 308 162 33 630933 6 5 ? 310100 311LR6 312 64 313 31641315316 31716 16318 31931260321 322 3231 46324 71 325 326 123N2H74+ 328149 32499 48330 33146 332 333 334 61335 185336 337187 338339340341 6342 COH?41343 63C444O2HCOH43245COH5346 347 161 348 1753498350187 187 351 352 353 354 RPT4623A55 355 356 357 358 359 360361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 Energy metabolism C4O2H 6C4O2H 7 41COH? RP2T24A 156 185 190 50 24 32 67 191 5 21 aaX? 192 174 46 97 6 10 163 1 42 193 5 8 3 5 ? 171 ? 27 193 113 192 192 Fatty acid/Phospholipid metabolism 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 DA4K6O7TA 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486487 488 489 490 491 492 493 494 495 496 Purines, pyrimidines, nucleosides and nucleotides 5 aaX 5 aaX 5 aaX 20 aaX20 aaX 6 141 123 194 170 31 34 126 98 67 69 aaX5 aaX 21 6a2a0X 6aaX20 109 42 195 + 164 33 7 5 ? 56 164 117 104 196 196 183 184 70 25 12 18 165 121 42 124124 Regulatory functions 516 DNA Metabolism 497 498 499 500 501 502 503 504505 506 507 508 509 510 511 512 513 514 515 517518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 496 197 6 192 198 1 34 30 7 187 2 9 45 74 199 5 aa aa 46 aa 200 13 41COH? 9COH42 COH42 194 6201 8 202 55 202 64 64 14 64 64 64 64 10 123 64 52 Transport/binding proteins 570 571 126572 57432 57425 575 576 57374 578 579 86 580 581 582 168 583 584 165485 586 581797 203588203 589 590 591 592 59393 594 595 596 597129 598 599 204600 164011 602 603604605606 607 608 260059 610 205611612613614 615 616 46617206 618 561?9 620 11621 622 622307 62442 625 626 627 45 628 629 630 206631 29 632 613432634635636637 638 151 9 632908 624009641 TTrraannsslcartipiotnion Other categories 642109 642 209 643 92 rnpB 644 645 646 647648649 650181651 652 653 654 655 656 46657 65865931660 616519 662 6663 64664 665 666 667 668 669 670 31671 1 67222 62773 16 674 152675 676 677 66478 679 680 681 682 683156884 685102 68664 687 1168188 689 66905691?190692693 136294 695210696 691769698 69970073041 701241 703 70467 705 RPT43C706 150 707 176028 709 710 711 712 71136 714 715 712601 ? 5 71717218 719 720 721 722 723 724 725 726 UCnoknnsoewrvned hypothetical 726 7279177 71238 729 731062731 73621 733 157334 COH41735 67C34O62H CO7H432?7 68738 739 714308 9 207741 12742 87043 7441 74571416 ura724070 161748 7149 750 751 7652 83753 75483 7?555 45 75634 757 75874 759 760 91 761 762 7638177664 765 766208 76746 768 76945 77607 210771 174762 773774 775 1 77671 777 778 779 780781 782 783784 78580786 768780 718082 789 790 719418 792 793 16794 791599 796 797 798 79998 800 801 802 195 803 Signal peptide 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 48 16 Na+ 9 11 34 199 45 7 57 17 109 129 16 124 7 135 6 106 139 188 179 9Pi?180 4R2PT6C 161 204 66 16 1112 45 182 27 174 7 60 186 24 145 145 r4ib4 7 rib5 5 rib Fe+L+P+ TLripaonpsrpootretienr 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 899 900901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 60 rib rib 155 191 25 198 4 646 42 42 51 951 898 134 1151 RP4T7BRPT8B 1 8 1 2 9 3 ? 12 RP4T3A ? 5 6? 6 11 ?7 9 GES region 94 Paralogous gene family 9561195?77 958 8 959 960 9 9619 96210 963 11 964 965 966 9647 96896997097112 1937213 973974 14975 1947697797184 979980981 982 9?835 15 K+?984 16 17 19885 11986 91897 5 aaX 5 aaX29088aaX 6 a2a0X 6989 21 a9a9X0 RP2T24B 991 23 92942 993994995996 99275 9982969910001001 1002 217003 281004 1005 1006 1007 1008 120909 1010 1011 6130012K+15 101103K+ 311014 1015 1016 1017 1018 101932 1020 10216 10226313023 1022641025 130426 35102713361028 1029 13073010316381032 RAuetpheeantt iRc eFgriaomne Shift 1032103339 130934 391035 1036 1037 1038401039 1040 1041 104g23Pi140143 6104g43Pi42 6g3P41i2045 1046 1047 1048 1049 1050 1051 1054231053 101514 10551056 1057 10586 1059 bca1a060 810b6c1aa44 7bc4a14a062bca5a bca15a06345 10664 1065 421066 140667 140268 106946 1070 1071 2010a7a2X 6 107321 aaX 5 aa1X074 5 aaX 20 aaX1075 1076 1077 1078 1079 10802150811082108313 1084 45 Mg2+? 108457 416086 1087 481088 491089 34 1090 1091 1092 1093 50 109446 ? 1095 1096 10979 11609581109952 1100 1101 1102 1103 1104 114025 11061107 1109 11110 1111461112 Archaeal Islands 1112 141613 111453115145 8 111106 11117 551118 1119 56 115270 112158 1122 1123 19 11241125 15126aaX 1127591128 1129 511a3a20X0 aaX 1162301 aaX 61132 211133 aaX 113640 1135 1136 113742 11m38al 16139 141240mal11418 114421 mal 1143 114347 611348535 13 36 1146 16 361147 81148 1149 1150 161151 115260 11535 aaX11545 aaX112505 aaX2011a5a6X 11657 115281 aaX 115961 116062 1161 21 1a1a6X2? 1163 116643 1165 116646 1167 65 1168 5 CO1H16C94O2H? 6C14O127H0 6 141171 117266 1173 117141715176 1177 161778 1179118011181 1182 68 116893 1184 JAMAICA 1184 70 11851186 1187 1188 25 52 1189 11 ? 11590 1191 1192 71 46 712193 5 Pi P4i2 6114P92i4 6 P1i195 1196 171397 1198 73 119191 1200 1201 1202COH5 120374 1204 4212051206120712081209 1210 122121 121275 716213 12141215 1216 121777 78 121186 79 121196 121260 122721 11122280 1223 1224 1 128215 82 122683 ? 5 1227633 633122833 1229 ? 12350 841231 81 123282 123383 1234? 15235 1236? 12375 182438 8R11P2T319B 124805 124112428132431244124583 1246633 ?5 12?47 51248 1249 23S 23s rRNA 182649 1250 R81PT11A251 815252 123543 1254 1255 1012?5867 1257 761258 1B2a5rb9 12602 1261 126828 1263 1264 126545 1266 1267 152468 17269 127089 1271 90127245 901273 1274 1275 1276 71 1277 46 122798 1279 192180 912281 913282 12813?51284 192485 94 1286 95 1287 1288 12819290 11s2/9p141 129s2/p51293 64s2/p 172944s/2p 1295 1296 1297 129812991300 1301 1302 193603 271304 123015390761307183103809 193810 131113113213 1314 13115316 1317 31 1318 1319 13210321 132299 132a3nt?5 1632a4nt?33 641325 1326 1327 132811 1329 16S 5S 51s6 sr RrRNNAA 13301331 1332 641331300 101 1334 13315 94 5133?6 1021337 13381339 13401613411613421314033 1344 161345 16 1346 116347 1014348 134429 1350 7 gl1y3c51 24 13521315035 1354 13551356 113657 1358 105 1359 1360 1361106 107 1362 28 1363 136413651366138687 1368 671369 75 1370 137164 1372 13731081374 1375 109 1376 1377 1378 1379 1380 1387171382 1383 1384 1385 138613871388671389 7461390 110 139161 13912111393 1394 1395 1396 112 1397 139891399 1400 14011402 1403 1404 1405 1406 1407 1408 61740697 4 tRNA 140967 1410 141114114213 1131414 7 1415 1416 1417 1418114 1419 1420 1421115 1422 14231161424 1171425 102 1426 1427118 1428LR4 1421919 1141390 1201413211 8 1432 7 1433 143414351436161437 16 1438122 14639 134440 1441 14421443 144493 913445 74 14146 1447 1448 1449 1450 123 1451 9 1452 1453616454145514516457145814591460 1461 14612463 1464 1465 1466 1467 1241468 1469 1470 1471 1472 147314741471547614771478 1479 1480 148114182251483148414851486148714881489149014911942921493 1494 14951496 1497 14981499 15001501 1502 1 kb 1502 150346 21504150515061261507 15083 1509 151016 1511 1512 46 1513 1514151545 1516127 1517 1105918 1519 1520 341521 1522 11052?33 117 1524 1525 15261115027 1528 1512191 1530 1531 111 1532 153132185341535 16536 1291537 1538 1539 1540 1310541 1542 154367 1544 1545 19 1546 1547 1021548 1549 1550 155198 1552 1553 15541555 1556 1557421558 1559 1560?51561131 1562 1615631564 1565 15661567 1L5R6831569 15701571 1572 1573 1574 132 1575 157667 1577 1578 25 157719 145680 17581 1582 1583 1584 1585 KINGSTON 15851585615?8727158871 133 1589 13415901592111355921593 1594 RP4T3D 461595 1316596136 159746 15981591937160808 1601 1602 1603 1604 160595 1606 1607 1608 160964 16106 169161 161112 1613 14661416151615621617101163188 1619 6162016123191622 1160273 1624 113 16251626 1627 1628 162978 1630 1631 1632 1633 42 1634 1635 1160362 140 1637501638 163931 1640 1641 146242 141 1643 1644 167425 16486 1647 16481649 1650 1651 127 71652 1653 1654 1 16551421656 1657 16a5a8X 162519 1660 1666a1a2X10662 6 166261a6a26X03 16564aa1X6655 a1a6X66143 1616473 R11P06T8628B166196174041671 164732 1673 144 1674 167516 167468 167749 1678 1167916181006881 1682 145168316841685 1686114 1687 168816891690 1691 1692 1619435 1694 1695 1696 641697 1698 8 1699 115 1700 17011 1702 1703 32 1118704 1705117 17061707 17082 110709 1 147610 171117121713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 LR12729146 1730 17311732 1733 1734 1735 1736 1737 1738 171369 719740 1741 172492 1741347 c1h4r1o77?44chro?1745 25 1746 148 17417481748 1147848 1749 1750 1751149 175246175R3PT43B1754150 174565 175262 1757 17581312759 1760 42 mal 16761 42 ma4l2 176241 mal 176317614312765 1766 1767 1768 1769 1770 1771 1173722 1773 1774 61 1775 1776 1777 17781779 1780 1781 65 1782 178362 1C748O24H 6 C147O28H5 6178641178C7OH? 601788 1789174920 1791 1792 1793 341794 1795 1796 1797 1798 1799 1800 1801 16 1801251 116 137 1808391804 180526 1806 1807 1808 1809 1810 1811 1812 18131814 18151816 1817 1818 1819 1820 1821 1822 1823 18231824 1825 1826 1827 1828 18219830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 18631864 1865 1866 1867 1868 1869 1870 18711872 1873 1874 1875 1876 1877 1878

Description:
Conservation of gene order between T. maritima and Archaea in many of the gene transfer may have occurred between thermophilic Eubacteria and Archaea.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.