ebook img

Codon Bias Signatures, Organization of Microorganisms in Codon Space, and Lifestyle PDF

15 Pages·2005·0.85 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Codon Bias Signatures, Organization of Microorganisms in Codon Space, and Lifestyle

Codon Bias Signatures, Organization of Microorganisms in Codon Space, and Lifestyle A. Carbone,* F. Ke´pe`s,(cid:1) and A. Zinovyev(cid:2) *Ge´nomique Analytique, Universite´ Pierreet Marie Curie, INSERM U511,91,Bddel’Hoˆpital, 75013Paris,France; (cid:1)ATelier deGe´nomique Cognitive,CNRS ESA8071/Genopole,523, Terrasses del’Agora, 91000Evry,France; (cid:2)Institut des Hautes E´tudes Scientifiques, 35,route deChartres, 91440Bures-sur-Yvette, France Newandsimplenumericalcriteriabasedonacodonadaptationindexareappliedtothecompletegenomicsequencesof80 Eubacteriaand16Archaea,toinferweakandstronggenometendenciestowardcontentbias,translationalbias,andstrand bias.Thesecriteriacanbeappliedtoallmicrobialgenomes,eventhoseforwhichlittlebiologicalinformationisknown, andacodonbiassignature,thatisthecollectionofstrongbiasesdisplayedbyagenome,canbeautomaticallyderived.A codonbiasspace,wheregenomesareidentifiedbytheirpreferredcodons,isproposedasanovelformalframeworkto interpretgenomicrelationships.PrincipalcomponentanalysisconfirmsthatalthoughGCcontenthasadominanteffecton D codonbiasspace,thermophilicandmesophilicspeciescanbeidentifiedandseparatedbycodonpreferences.Twomore o w examplesconcerninglifestylearestudiedwithlineardiscriminantanalysis:suitableseparatingfunctionscharacterizedby n setsof preferredcodonsareprovidedto discriminate: translationally biased (hyper)thermophiles from mesophiles, and loa organisms with different respiratory characteristics, aerobic, anaerobic, facultative aerobic and facultative anaerobic. de d Tpehresspeecretisvuelstsasrueggneostetdth,antucmodeorincabliacsristpearicaeamnidghdtisretaflneccetsthaemgoenogmoertrgyanoifsampsroarkearvyaoltiidca‘t‘epdhyosnioklongoywsnpaccaes.e’s’,Eavnodluvtiaornioaurys from resultsandpredictionsarediscussedbothonmethodologicalandbiologicalgrounds. h ttp s Introduction ://a c a Statistical analysis of DNA sequences and in codon usage, and amino-acids usage (Kreil and Ouzounis d e m particularofcodonbiaswereperformedfromthemoment 2001; Tekaia, Yeramian, and Dujon 2002) organize ic that long chunks of DNA sequences were publicly avail- organisms roughly in a similar manner: relative distances .o u ableintheearlyeighties(Granthametal.1980;Wadaetal. among most phylogenetic genuses are preserved across p.c 1990),andtherootsforthesestudiescanbetracedbackto spaces.Therearepairsoforganismsthough,whoserelative om thesixties(Sueoka 1962;ZuckerkandlandPauling1965). distancemayvaryconsiderablydependingonthespaceone /m b However,withtheincreasing numberofbacterial genome chooses(seelater).Themainmotivationforthisworkwasto e /a sequences from a broad diversity of species, this field of define a space whose coordinates are mathematically well rtic researchhasbeenrevivifiedinthelast5years(Kooninand definedaswellasjustifiablebybiologicalintuition,andto le Galperin 1997; Lin and Gerstein 2000; Radomski and revisit organism distances within this framework. The -ab s Slonimski 2001; Knight, Freeland, and Landweber 2001; mathematical rigor is a particularly important requirement tra Sicheritz-Ponte´nandAndersson2001;Daubin,Gouy,and for genome comparison; suitable biological properties can ct/2 Perrie`re 2002; Lin et al. 2002; Lobry and Chessel 2003; then be tested, validated, and possibly predicted across 2 /3 Sandberg et al. 2003). Pioneer work in inferring bacterial organisms. /5 4 similarity relationships using large chunks of genomic An organism is defined through the set of preferred 7 /1 sequences is attributable to Karlin. In a series of papers codons shaping its genome. The basic idea is simple and 0 7 startingwith (Karlin1994; Karlin,Ladunga,andBlaisdell goes back to two main facts: first, the genetic code 5 9 1994), Karlin et al. showed how dinucleotide relative associates a set of sibling codons to the same amino acid, 22 abundance values (profiles) of different DNA sequences and some codons occur more frequently than others in by samples of size (cid:1) 50 kb from the same organism are genesequences (Granthametal.1980; Wada etal.1990); gu e generallymuchmoresimilartoeachotherthantheyareto second is the hypothesis, formulated by Sharp (Sharp and st o profiles from other organisms, and that closely related Li 1987), that for each genome sequence G, there is a set n organisms generally have more similar profiles than do of coding sequences S, constituting roughly the 1% of the 13 distantly related ones. genes in G, which is representative of the dominating Ap The interest of comparing organisms leads to the codon bias in G. Many observations support this hypoth- ril 2 problem of defining biologically meaningful spaces from esis: for bacteria and small eukaryotes like Saccharo- 01 9 which to extract new insight into organism similarities. mycescerevisiae,Caenorhabditiselegans,andDrosophila Spaces rising from a direct statistical analysis of genomic melanogaster for instance, which are governed by trans- sequences, based on dinucleotide frequencies (Karlin lational bias, this set is constituted mainly by ribosomal, 1994; Karlin, Ladunga, and Blaisdell 1994; Karlin and glycolytic,heat-shockproteins,andelongationfactors;for Mra´zek 1998), as well as codon usage, synonymous Pseudomonas aeruginosa, the set contains the proteins withthehighestGC3content;forBorreliaburgdorferithe set is constituted solely by genes lying in leading strands (Carbone,Zinovyev,andKe´pe`s2003).Combiningthetwo Key words: Prokaryotes, Archaea, codon bias, codon space, factstogether,onecandefineweightsforcodonsongenes microbialevolution,microbiallifestyle. in S, which are representatives of codon preferences, as E-mail:[email protected]. follows. Given an amino-acid j, its synonymous codons Mol.Biol.Evol.22(3):547–561.2005 mighthavedifferent frequencies inS;ifx isthenumber doi:10.1093/molbev/msi040 i,j AdvanceAccesspublicationNovember10,2004 of times that the codon i for the amino acid j occurs in S, Molecular Biology and Evolution vol. 22 no. 3 (cid:1) Society for Molecular Biology and Evolution 2004; all rights reserved. 548 Carboneetal. then oneassociates toiaweight w relative toits sibling 64 codon weights, and we introduce a linear distance i,j of maximal frequency y in S between genomes, which is new to comparative analysis. j x Distances among organisms are validated on known cases wi;j ¼ yi;j: ð1Þ of strains, species, and established phylogenetic branches. j Codonbiasspaceisproposedasanovelformalframework Such weights have been successfully used by Sharp to to interpret genomic relationships and biologically impor- correlate expression levels to translational codon bias tant features including lifestyle and evolutionary trends. (SharpandLi1987).(Noticethatweightsequalto1donot Principal component analysis is applied to codon bias correspondtocodonswhicharethemostfrequentoverthe space and confirms that although GC content has a entire genome: more than 10 amino acids in Bacillus dominant effect, optimal growth temperature explains the subtilis, for instance, are preferentially coded with codons second principal component (Lynn, Singer, and Hickey otherthanthosewhicharethemostfrequentovertheentire 2002). As a consequence, thermophilic and mesophilic genome.)Weightscalculated overSallowustodefinethe species can be identified and sharply separated by codon D codonadaptationIndex(CAI)(SharpandLi1987),which preferences. Using linear discriminant analysis, two more o w produces a rank of all genes in a genome agreeing with examples concerning lifestyle are studied, and suitable n lo dominating codon bias: genes ranking the highest are the separating functions characterized by sets of preferred a d most biased and those ranking lowest are the less affected codons are provided to discriminate translationally biased ed by selective bias. More generally, it has been shown that (hyper)thermophilesfrommesophiles,andorganismswith fro m CAIcorrelatestoanykindofdominatingbiasingenomes differentrespiratorycharacteristics:aerobic,anaerobic,fac- h (likeGC-content,preferenceforcodonswithGorCatthe ultative aerobic, and facultative anaerobic. These results ttp s thirdnucleotide position,a leading strand richerinG1T suggestthatcodonbiasspacemightreflectthegeometryof ://a than a lagging strand), and not just to translational bias a prokaryotic ‘‘physiology space.’’ Evolutionary perspec- c a (Carbone, Zinovyev, and Ke´pe`s 2003). Moreover, an tives are noted, and various results are discussed both on de m algorithm for the automatic detection of S from the methodological and biological grounds. ic collection of all genes in a genome has been proposed in .o u (Carbone,Zinovyev,andKe´pe`s2003);sincethealgorithm Materials and Methods p.c isnotbasedonanybiologicalknowledgeoftheorganism, Genomes and Replication Origins om it allows us to determine weights for those genomes for /m Genomes,alongwithgeneannotation,wereretrieved b whichnotmuch biological information is available. e from the Genomes directory of the GenBank FTP (see /a Weightsarehighlyspecific toagenome,they canbe table1).Allcodingsequences(CDS)wereconsidered,in- rtic definedforanymicroorganism,theyaregoodindicatorsof le the evolutionary process under which the organism has cluding those annotated as hypothetical and those pre- -ab dicted by computational methods only. From each CDS, s gone, and they seem shaped by the metabolic constraints we excluded initiation and stop codons. tra oftheorganismduringevolution(Wagner2000).Forthese Informationonthereplicationoriginandterminusfor ct/2 reasons, we use codon weights to represent genomes (a 2 38 bacteria has been taken from the following web site, /3 genome becomes a [normalized] vector of 64 weights). /5 http://pbil.univ-lyon1.fr/emglib/emglib.html,wherethepre- 4 In the first part of this article, we present new and 7 diction of these locationswasbased on theworkof Lobry /1 simple statistical criteria that correlate a bias of a given 0 (1996). For most organisms in table 1, this information is 7 origin (content bias, translational bias, strand bias) to CAI still unknown. 592 valuesofgenes.Eachcriterion,beingbiasspecific,allowsus 2 b toinferweakorstrongtendenciesofagenometowardthe y Mesophilic, Thermophilic, and Hyperthermophilic g bias, and possibly provides a numerical evaluation of the u Genomes e strength. Suitable numerical thresholds are proposed, and st o they allow for an automatic detection of a codon bias Mesophiles are organisms with an optimum growth n 1 signature,thatisthecollectionofstrongbiasesdisplayedby temperature (OGT) near 378C; thermophiles have OGT 3 A agenome.Twoofthecriteriaallowustodeterminewhether between 45 and 658C, and hyperthermophiles have OGT p an organism is affected by some (weak or strong) form of (cid:1) 658C, preferably around 808C or higher (http:// ril 2 translational bias, and in this case to infer putative gene www.mblab.gla.ac.uk/dictionary). 01 9 expressionlevelsfortheorganism.Thisisdonewithnouse of gene expression data (Jansen et al 2003) nor of gene Nucleotide Frequencies: Some Definitions classification and protein class comparison (Karlin and GC-content is the frequency of G 1 C base pairs Mra´zek2000;Mra´zeketal.2001;Karlinetal.2003).Our (bps); GC3-content is the frequency of G 1 C bps at the criteria can be applied to any genome for which no codonsthirdposition(excludingMet,Trp,andtermination biologicalknowledgeisyetavailable.Allnumericalcriteria codons); XY-skew is defined as ðN (cid:2)N Þ=ðN þN Þ, X Y X Y have been validated on previously established analysis of where N , N represent the frequencies of the nucleotides codonbias;contrarytowhathasbeenclaimedbyAnderson X Y X, Y 2 fA, T, G, Cg, with X 6¼ Y. andSharp(1996),atendencytowardtranslationalbiashas been detected for Rickettsia prowazekii. Predictions on Computation of CAI Values newlysequencedgenomeshavebeendeduced. In the second part ofthis article we describe a codon The algorithm proposed by Carbone, Zinovyev, and bias space where genomes are identified by their specific Ke´pe`s (2003) is used to detect a set S of genes which are Signatures,MicrobialCodonSpace,andLifestyle 549 Table 1 Table 1.Continued Thermophilic, Hyperthermophilic, andMesophilic PROTEOBACTERIAbeta Eubacteria andArchaea 53 NeisseriameningitidisMC58 AQUIFICAE 54 NeisseriameningitidisZ2491 1H Aquifexaeolicus 55 Ralstoniasolanacearum CYANOBACTERIA PROTEOBACTERIAepsilon 2 Nostocsp 56 Campylobacterjejuni 3 SynechocystisPCC6803 57 Helicobacterpylori26695 4 Thermosynechococcuselongatus 58 HelicobacterpyloriJ99 ACTINOBACTERIA PROTEOBACTERIAgamma 5 Bifidobacteriumlongum 59 BuchneraaphidicolaSg 6 CorynebacteriumefficiensYS-314 60 Buchnerasp 7 Corynebacteriumglutamicum 61 EscherichiacoliK12 8 Mycobacteriumleprae 62 EscherichiacoliO157H7 D 9 MycobacteriumtuberculosisCDC1551 63 EscherichiacoliO157H7EDL933 ow 10 MycobacteriumtuberculosisH37Rv 64 Haemophilusinfluenzae nlo 11 Streptomycescoelicolor 65 Pasteurellamultocida ad 66 Pseudomonasaeruginosa e FIRMICUTESBacillales d 12 Bacillushalodurans 6678 SSaallmmoonneellllaatthhyypphhiimuriumLT2 from 13 Bacillussubtilis 69 Shewanellaoneidensis h 14 Listeriainnocua 70 Shigellaflexneri2a ttp 1156 LOicsetearniaobmacoinlloucsytiohgeeynenessis 7712 WViibgrgiolescwhoolrethraiaebrevipalpis s://a c 17 StaphylococcusaureusMu50 73 Xanthomonascampestris a d 18 StaphylococcusaureusMW2 74 Xanthomonascitri e m 19 StaphylococcusaureusN315 75 Xylellafastidiosa ic FIRMICUTESClostridia 76 YersiniapestisCO92 .ou 77 YersiniapestisKIM p 20 Clostridiumacetobutylicum .c 21 Clostridiumperfringens CHLOROBIALES om 22H Thermoanaerobactertengcongensis 78 ChlorobiumtepidumTLS /m b FIRMICUTESLactobacillales DEINOCOCCUS/THERMUS e/a 222345 LSSttarrceetppottcooocccoocccuccsuusslaaacggtiaasllaaccttiiaaee2N6E0M3316 79 THDEeRinMocOocTcOusGrAadLioEdSurans rticle-a 26 Streptococcusmutans 80H Thermotogamaritima bs 27 StreptococcuspneumoniaeR6 ARCHEOGLOBALES tra c 28 StreptococcuspneumoniaeTIGR4 81H Archaeoglobusfulgidus t/2 29 Streptococcuspyogenes 2 30 StreptococcuspyogenesMGAS315 METHANOBACTERIALES /3/5 31 StreptococcuspyogenesMGAS8232 82H Methanobacteriumthermoautotrophicum 4 7 FIRMICUTESMollicutes METHANOPYRALES /10 7 32 Mycoplasmagenitalium 83H Methanopyruskandleri 5 9 33 Mycoplasmapneumoniae 2 SULFOLOBALES 2 34 Mycoplasmapulmonis b 35 Ureaplasmaurealyticum 8845HH SSuullffoolloobbuussstooklfoadtaariiicus y gu FUSOBACTERIALES e 36 Fusobacteriumnucleatum THERMOPLASMALES st o 86T Thermoplasmaacidophilum n SPIROCHAETALES 87T Thermoplasmavolcanium 13 37 Borreliaburgdorferi A 3389 TLreepptoosnpeimraaipnatelrlirdougmans 88H DEASeUroLpFyrUuRmOpCerOniCxCALES pril 2 0 CHLAMYDIALES HALOBACTERIALES 19 40 Chlamydiamuridarum 89 Halobacteriumsp 41 Chlamydiatrachomatis METHANOCOCCALES 42 ChlamydophilapneumoniaeAR39 90H Methanococcusjannaschii 43 ChlamydophilapneumoniaeJ138 METHANOSARCINALES PROTEOBACTERIAalpha 44 AgrobacteriumtumefaciensC58Cereon 91T Methanosarcinaacetivorans 45 AgrobacteriumtumefaciensC58UWash 92T Methanosarcinamazei 46 Brucellamelitensis THERMOCOCCALES 47 Brucellasuis1330 93H Pyrococcusabyssi 48 Caulobactercrescentus 94H Pyrococcusfuriosus 49 Mesorhizobiumloti 95H Pyrococcushorikoshii 50 Rickettsiaconorii 51 Rickettsiaprowazekii THERMOPROTEALES 52 Sinorhizobiummeliloti 96H Pyrobaculumaerophilum 550 Carboneetal. representative of the dominating codon bias in a given Space of Organisms and Its Visualization genome. This reference set S contains the 1% of the most An organism is represented by a 64-dimensional biasedgenesofthegenome(thesizeofScorrespondstothe vector, whose entries correspond to the 64 codon weights one suggested in Sharp’s original work (Sharp and Li w oftheorganismcomputedforasetofgenesS,whichis 1987)).FromS,onecomputesweightsw forcodoniand i i,j representative of the dominating codon bias of the or- organism j as in equation (1). These weights w are then i,j ganism. (Stop codons UAA, UAG, UGA, and UGG, AUG usedtocomputetheCAIforallgenes,CAI(g)¼((cid:1)L w )1/L, k¼1 k with no synonymous codons could be disregarded. No wheregisagene,w istheweightofthekthcodoning,andL k substantialdifferenceinthedeterminationofthereference isthenumberofcodonsing(SharpandLi1987). set S nor in the 3D visualization occurs.) Notice that the ‘‘preference’’ of a codon among Hence, an organism is a point in the 64-dimensional synonymousonesisidentifiablebycodonweightequalto space[0...1]64,wherenospecialassumptionismadeonthe 1. Theoretically speaking, multiple synonymous codons space nor on the coordinate system. The set of points is (possibly all n synonymous codons of a n-fold degenerate visualized in3dimensionsbyusing principal components D aminoacid)mighttakevalue1,andonecanthinkofthose o analysis(PCA)(Hotelling1933;Hand,Mannila,andSmyth w asbeingequallypreferred.Inpractice,noequallypreferred n 2001): first, every coordinate is normalized on unity lo codons ever occurred in our analysis of 96 organisms. In a particular, it should be noticed that equal codon prefer- standarddeviationtotakeintoaccountequallydominating ded ences represent a possible, but merely theoretical, con- aesmwpleolyleadsinraPreCAco,dthoensno(fromllaolwizeindgwtehieghsttawnkd*afrodrpcorodcoenduiirne from dition under which homogeneous codon composition— organism k is defined as (wk 2 w(cid:2))/r, wihere wk is the h that is the absence of compositional bias, strand bias, and weightofiink,w(cid:2) istheaveragieweigihtioficomputiedwith ttps translational bias—can take place. respecttoallorganiismsk,andr isthestandarddeviationfor ://a Codonweights,referencesetSandCAIvaluesarecal- i c culated with the program CAIJava written by the authors, the set of weights wki, for all organisms k); then, three ade principalcomponentsforthecloudofpointsarecalculated; m w(hhttipc:h//wusweswp.bairosjearvsao.ofrGg)enpBroagnrkamflamtifinglespafrcokmagteh.eTBhieojiadveaa finally,thecloudofpointsisprojectedorthogonallyinthe ic.o oitfertahteioanlgio1rith1mcoismspimutpeslec.oItdiosnawneitiegrhattsivbeaaselgdoorinthamsethtaStoaft smuebasnpsacoef aof3Dthevitehwreeer.sTelheectperdojveecctitoonrsoafntdhevpisouianltiszeindtbhye up.com genes selected at iteration i, then ranks all genes with principal plane (defined by the first two principal axes) /m explains58%ofthevariance(withthefirstcomponentthat b respecttoCAIvalueandselectsanewsetS,whichhashalf e explains 45% of the variance; this ensures that the PCA /a thecardinality oftheset determined atiterationi(if atthe projection reflects well the total information embedded in rtic ithiterationtheselectedsetisalreadyconstitutedbythe1% le ofallgenes,thenthenewsetwillalsobeconstitutedby1% the original data matrix). The three principal axes explain -ab 65%ofthevariance. s orefpgeeanteeds)uanntdilw1h%oseofgegneensesscohraevtehebeheignhseeslte.cTthedeparnodcecsosnis- of thePrsipnaccipealofcoomrgpaonnisemntssaarneadlyosnies ianndVitdhaeEvxipseuratl,izaattiooonl tract/2 vergenceisreached.Atthestart,Sisthesetofallgenes.A 2 developed by A.Z. A specialized 3D viewer is provided /3 description of the algorithm and a validation of the /5 with VidaExpert. All software is available at http:// 4 approach is reported in Carbone, Zinovyev, and Ke´pe`s 7 www.ihes.fr/;carbone/data.htm. The interactive version /1 (2003). The program CAIJava is available at http://www. 0 of figure 1 (top) can be found at http://www.ihes.fr/ 7 ihes.fr/;carbone/data.htm. 59 ;materials/organisms/htmlview.html. 2 2 b y Plasmids g u Linear Analysis of the Space of Organisms e A chromosome is distinguished from a plasmid by st o assuming that it contains genes which are essential for Lineardiscriminantanalysis(LDA)(Fisher1936)has n 1 metabolism under all growth conditions, i.e., housekeep- been used to detect relevant patterns in the high- 3 A ing genes; plasmids generally provide gene product that dimensional space of organisms: (1) hyperthermophiles, p canbenefitthebacteriumundercertainconditions,suchas thermophiles, and mesophiles; (2) translationally biased ril 2 0 resistance to antibiotics (Madigan, Martinko, and Parker (hyper)thermophiles and mesophiles; and (3) organisms 1 9 2000). Some prokaryotes contain more than one chromo- with different respiratory characteristics. For each appli- some,suchasMethanococcusjannaskii(3),Vibriocholerae cation, we construct a linear discriminant function (2),membersofthegenusAgrobacterium(2)andBrucella fðkÞ¼a þP64 awk, where wk is the weight of codon 0 i¼1 i i i (2).Oftheorganismsweconsidered,22containplasmids, iinorganismk,andwheretheseparationcoefficientsa 2 i butforonly6ofthemtheratioP/C,wherePisthenumberof (21, 11) are computed with the LDA algorithm. The bps in the plasmids and C is the number of bps in the purpose of LDA is to determine the coefficients a that i chromosome(s), is .10%. In particular, B. burgdorferi discriminate best a set X from a set Y, that is optimize the whichhas alinearchromosomeand 21 circular andlinear ratio (mean difference)2/variance. For applications 1 and plasmids,hasP/C¼66%. 3,wefindalineardiscriminantfunctionfsuchthattheset The calculation of t-values, concerning leading and of k’s with f(k) . 0 is exactly X (all true positives and no lagging strand bias, is done considering chromosomal false negatives appear); for application 2, translationally CDSsonly.Thisisparticularlyimportanttocorrectlyeval- biased mesophiles can be separated with 94 true positives uatethosegenomeslikeB.burgdorferiwhoseP/Cislarge. and 2 false negatives, specificity Sp ¼ 97.78, sensitivity Signatures,MicrobialCodonSpace,andLifestyle 551 D o w n lo a d e d fro m h ttp s ://a c a d e m ic .o u p .c o m /m b e /a rtic le -a b s tra c t/2 2 /3 /5 4 7 /1 0 7 5 9 2 2 b y g u e s t o n 1 3 A p ril 2 0 1 9 FIG. 1.—Top: Principal components analysis principal plane representation of the distribution of 96 organisms according to codon weights (numbersasintable1).Archaea(squares)andEubacteria(circles)arecoloredwithapreferentialorderofcodonbiases:translationalbias(red),GC3- bias(green),AT3-bias(blue),strand(orange),AT-skewbias(lightblue),nobias(black).Bottom:Two-dimensionalplotofthe96organismswiththe (cid:2)z scaleonthey-axisandthed(wG,wS)scaleonthex-axis.Theorganismswithd(wG,wS).8arealltranslationallybiasedbytheribosomalcriterion Rib withtheexceptionofX.fastidiosa(75),aGC3biasedgenome,withd(wG,wS)¼10.43;othertwoorganisms,T.volcanium(87)andT.pallidum(38), whichareAT3andstrandbiasedbutnottranslationallybiased,approximatecloselythethresholdd(wG,wS)’8. 552 Carboneetal. Sn ¼ 97.78. Even if it is the full set of a values that intuition.ConclusionsdrawnusingEuclideanmetrics(less i defines f, the largest positive (smallest negative) values a obvioustojustify)remaincompatiblewithourresults.The i defining f indicate codons that are preferentially used in three large families collecting GC rich, AT rich, and X (Y). For 3, we trained (with leave-one-out cross-valida- translationally biased organisms, for example, which are tion) the linear discriminant function f and tested predic- visualized in the tree of figure 2 for 1‘ -distances, and in 2 1 tion performance on the remaining data. We did this on the Supplementary Material online for 1‘ -binarized and 2 1 organisms represented in 64 dimensions and on 2 dimen- Euclideandistances,arecomparable.Inparticular,relative sions after having applied PCA to the data set. We distances among organisms in these codon spaces remain obtained13.5%errorsinthefirstcaseand15.6%errorsin coherent. thesecond.Asexpected,thenumberofvariablesrequired A few over-represented bacterial species, like c- foroptimaldiscriminationisgreaterin64dimensions(20) proteobacteria and firmicutes, bias the set of available than in 2 (7). LDA was done in VidaExpert, and the sequenced genomes. Also, Archaea are relatively few training of the LDA function was done in R. comparedtoEubacteria.Asaconsequence,acomparative D analysis of species drawn on such a sample needs to be o w carefully evaluated. Namely, the clustering suggested by n Distances lo figure2mightnotrevealsomefeaturesoftheorganisation a d Let us consider two kinds of point sets representing due to over- and under- organism representation. ed a genome: the set of codon weights wi, and its binarized fro m form w(cid:2), where for each codon i, we approximate to 0 all i Prokaryotes Characteristics h weights wi 6¼ 1, i.e., for those codons which are not ttp s preferred.Hence,thissecondsetofpointsisconstitutedby For all references to the ecology, genetics, and ://a values 0 and 1 only. physiology of prokaryotic organisms we follow closely c a Distancesbetweenpairsoforganismsaremeasuredas the work of Balows et al. (1992) and Madigan, Martinko, de ‘‘12‘1-distances’’incodonspace.GiventwogenomesG1,G2 and Parker (2000). mic and two collections w(cid:2)1,w(cid:2)2 of binarized weights w(cid:2)1,w(cid:2)2, we .o i i u definethebinarizeddistancebetweenG1andG2tobe Criteria to Detect Codon Bias Signatures and p.c o d ðG ;G Þ¼P6i¼41jw(cid:2)1i (cid:2)w(cid:2)2ij¼1‘ ðw(cid:2)1;w(cid:2)2Þ ð2Þ Tendencies m/m b 1 2 2 2 1 It is commonly recognized that organisms might be be /a The coefficient 12 in front of the usual ‘1-distance is subjected to codon biases of different origins. There are rtic considered because we want to count amino acids having examples for which it is rather difficult to decide what is le different preferred codons exactly once. Intuitively, this the most dominant codon bias, if it exists at all, as for -ab s distancerepresentsthenumberofaminoacidswithdifferent Helicobacter pylori for instance, a rather homogeneous tra preferredcodons.Wespeakabout‘‘binarized12‘1-distance.’’ genome (Lafay, Atherton, and Sharp 2000) or for Trepo- ct/2 Similarly, we use d(G1,G2)¼12‘1(w1,w2) when collections nemapallidum,whichdisplaysbothastrongGC-skewbias 2/3 ofweightsw1,w2areconsidered.Ifnototherwisespecified, (Lafayetal.1999)andastrandbias.Infact,itseemsmore /5 i i 4 with‘‘1‘ -distance’’werefertod(G ,G ). appropriatetothinkofbiasesina‘‘continuum’’wayinstead 7 2 1 1 2 /1 ofconsideringthemasclear-cutproperties,andtothinkthat 0 7 different biases might be present at the same time, with 59 A Tree Describing Distances 2 differentstrengths.Numericalcriteriatodetectthetendency 2 b The tree of figure 2 has been constructed using the ofagenometowardabiasandthestrengthofthisbiasare y g unweighted pair group method with arithmetic mean desirable,andweshallprovideasolutiontothisquestion. u e (UPGMA) as a distance method (with the program Theideasupportingourmethodistocorrelatecodon st o Neighbor, integrated in the PHYLIP package, and avail- biasesofdifferentoriginswithacommonmeasure,theCAI n 1 able at http://evolution.genetics.washington.edu). Figure 2 valuesofgenes.TheapproachisjustifiedbythefactthatCAI 3 A is used to illustrate an approximated distance between isauniversalmeasuretostudycodonbiasandithasbeen p pairsoforganisms(suchadistance,readoutofthetreeby proven to be highly correlated with dominant biases of ril 2 0 adding up the values along the shortest path that connects differentnature(Carbone,Zinovyev,andKe´pe`s2003).For 1 9 two leaves, is a priori neither an upper bound nor a lower eachgenomeandeachkindofbias,wecomputeacorrelation bound to the effective distance). No fact is inferred from coefficient that expresses the strength of the bias for the the tree in this article, besides the observation that it genome. The numerical coefficients can be used to rank organizes organisms in three classes reflecting AT, GC, differentgenomeswithrespecttoagivenbias,andtodetect andtranslationalbias.Thesamethreegroupsoforganisms whetheragenomehasatendencyforabias(inthiscase,the are found by using distance methods as NJ, BIO-NJ, and correlationcoefficientisexpectedtoberatherhigh)ornot. NNET (from the SplitsTree 4.0 package). For each criterion, we suggest a threshold, that is an indicatorforstrongbias;formally,ifT.0isthethreshold, thenforallgenomeGandbiasB,BisastrongbiasforGif Choice of Metrics and Over-Represented Families of and only if the coefficient computed for the bias B is Organisms boundedbyT. Thresholdsallow ustoautomaticallyiden- To choose a meaningful metrics is non-trivial. The tifystrongbiasesanddefinethecodonbiassignatureofan 1‘ -metricsanditsbinarizedversionarejustifiedbysimple organism to be the collection of its strong biases. The 2 1 Signatures,MicrobialCodonSpace,andLifestyle 553 D o w n lo a d e d fro m h ttp s ://a c a d e m ic .o u p .c o m /m b e /a rtic le -a b s tra c t/2 2 /3 /5 4 7 /1 0 7 5 9 2 2 b y g u e s t o n 1 3 A p ril 2 0 1 9 FIG.2.—Treeconstructed fromthe 12‘1-distancematrixfor the organismsin table1.A three-letter codedescribeswhetherthe organismis an Archaea(A,blue)oraEubacteria(E,red),anditsgenus(shortenedtotwoletters).Fiveconsecutivepositionsoccupiedbythesymbols1,*,–,_,3are interpretedasfollows:1.1translationalbiasbyribosomalcriterion,*translationalbiasbybothstrengthandribosomalcriteria;2.1GC3-content, –AT3-content;3.1strandbias,3noreplicationoriginisknown;4.1GC-skewbias,–CG-skewbias;5.1AT-skewbias,–TA-skewbias.The symbol_indicatesthatnobiasispresent. 554 Carboneetal. thresholds that we propose validate all strong biases that G),oneexpectsthedifferencebetweenwk(G)andwk tobe i i havebeendetectedfororganismspreviouslystudiedinthe large, and to use this quantity as a criterion to detect literature. translational bias. Thus, we use the 1‘ -distance between 2 1 Even though a signature allows for an immediate wk(G) and wk, defined as i i ‘‘picture’’ of a genome, it is important to stress that P64 jwkðGÞ(cid:2)wkj 1 signatures provide only a partial description, and that the dðwG;wSÞ¼ i¼1 i i ¼ ‘ ðwG;wSÞ ð3Þ most accurate one corresponds, in our view, to the entire 2 2 1 collectionofnumericalvaluesassociatedtodifferentbiases to discriminate those organisms that likely are affected by whichhighlights‘‘tendencies.’’Wesaythatagenomehas translationalbiasbyrequiringd(wG,wS).8(whered(wG, astrongtendencytowardabiasBifthecoefficientcomputed wS) 2 [1...13] for the 96 organisms in table 1; see forBisboundedbyT2(cid:1),forsomesmall(cid:1)(cid:1)0.If(cid:1)¼0then Supplementary Material). To explain the intuition behind wespeakaboutastrongbias.SincethethresholdTisdefined this formula, let us consider its binarized version, namely forallgenomes,itgivesthepossibilitytocomparegenomes the case where we set w(cid:2)k(G)¼0 if wk(G) 6¼ 1, and w(cid:2)k ¼ i i i D (andinparticulartocomparethemthroughtheirsignature). 0 if wi,j 6¼ 1. In this simplified form, equation (3) counts ow If a genome presents no strong bias on nucleotide the number of amino acids that have different preferred nlo frequencies,wesaythatithasaweaktendencytowardthe codons in the entire genome and in the set of most biased ad content bias that presents the largest coefficient (in ab- genes. ed solute value). This notion describes the nucleotide evol- Such a numerical criterion, being based only on fro m utionary pressure of a genome, and it is best used in the astatisticalanalysisofCDSs,ishighlydesirablebutitdoes h analysis of rather homogeneous genomes; for instance, it not provide a sufficient and necessary condition for ttp s allowsustosaythatH.pylori,despiteitsemptysignature, translationalbias.Infact,notallorganismssatisfyingtrans- ://a has a weak tendency toward GC-skew bias (in agreement lational bias are detected, and some extra organism, like ca with Grigoriev [2000]). Xylellafastidiosa,mightbeerroneously selected. We pro- de m poseitthough,becausethecombinationofthetwocriteria ic Ribosomal Criterion for translational bias detection allows us to discriminate .ou p thosegenomesthatarestronglytranslationallybiased(that .c This simple statistical test detects translational bias isthosesatisfyingbothcriteria)fromthosethatareweakly om and it relies on the idea that for translationally biased so (that isthose that only satisfytheribosomal criterion). /m b genomes,thepoolofribosomalproteinshashighCAIscore e /a comparedtotheaverageCAIvalueofallCDSs.Ingeneral, Content Criterion rtic ribosomal proteins are not expected to be highly biased, le and in particular, if bias exists, the interval within which GC3 bias is detected by comparing the GC3-content -ab ofeachCDSwiththecorrespondingCAIvalue,andasking s CdiAffIersecnotrefsrofmorgreibnoosmoematol pgreontoeimnse.vWarye umsieghtthibseseractohnedr trheleatcioonrre,la2tio0n.7cdoeetfeficctsieAnTt3(-obniasa.llGCCD-sSkse)wbbeia.s0is.7d;ecteocr-- tract/2 observation to measure translational bias strength for 2 tedwithacorrelationcoefficient.0.5;correlation,20.5 /3 a genome. More formally, we compute the average CAI /5 detects CG-skew bias. Thresholds 0.5 and 20.5 define 4 andthestandarddeviationr forCAIvaluesofallCDSs, 7 CAI AT-skew and TA-skew bias. /1 and define a z-score value for those CDS r annotated as 0 7 ribosomalproteins;i.e.,(CAI(r)2CAI)/rCAI.Wecall(cid:2)zRib Strand Criterion 592 the average of z-scores for ribosomal proteins and define 2 b the following criterion: an organism characterized by Strand bias says that most biasedgenesof a(circular y g translational bias isexpected tohave high(cid:2)z ,i.e., .1. or linear with bidirectional replication) genome are u Rib e Because ribosomal protein coding genes are highly preferentially distributed in precisely one of its strands st o conserved across species, they can easily be accessed by (typically the leading strand). This definition does not n 1 homology inorganisms not yetwell investigated, and this depend on gene function, and it allows us to detect strand 3 A renders the criterion amenable. bias for genomes whose strongest bias is of any origin. In p particular,wemakenohypothesisonhighexpressivityfor ril 2 0 mostbiasedgenes,andthisisinconcertwiththefindingof 1 Strength Criterion 9 RochaandDanchin(2003),whoshowthatessentialgenes This is an heuristic criterion for the detection of morethanhighlyexpressedarelocatedonleadingstrands. translational bias, which does not use any information To detect strand bias we verify the statistical hypo- coming from annotation of ribosomal proteins, and it thesis on the two distributions of CAI values of genes in consists solely of statistical analysis of CDSs. Let wk(G) leading and lagging strands of chromosomes (see discus- i be theweightcalculated asinequation (1) overthewhole sion on plasmids below). This has been done only for set G of CDS for organism k, and let wk be the weight those genomes whose replication origin is known. We i calculated over the set of most biased genes S for k. compute the t-value representative of the difference bet- Becauseoftheexistenceofaparticularlystrongdominant ween the means of the two distributions and say that or- codonbiasinorganismsaffectedbytranslationalbias(that ganisms with average t-value (taken as an absolute value) is, the frequency of a preferred codon compared to the .0.25 have leading-lagging strand bias. frequencies of its synonymous codons is much higher in Thiscriterionprovidesawaytocheckforstrandbias the set of most biased genes S than in the whole genome which is independent of that based on the co-existence Signatures,MicrobialCodonSpace,andLifestyle 555 between strand bias and GC-skew bias, proposed by Me´digue et al. 1991; Shields and Sharp 1987; Carbone, Sueoka (1962) and McLean, Wolfe, and Devine (1998). Zinovyev, and Ke´pe`s 2003). (The use of this idea to detect replication sites is To validate the ribosomal criterion, we looked at the envisageable but out of the scope of this study.) sets of most biased genes S determined by the evaluation of CAI values for each genome satisfying the ribosomal The Number of Codon Bias Signatures Is Limited criterion, for which we claim a translational bias. We checked the annotation of the genes in the set of most Translational bias is strongly correlated with GC3 biased genes, and we positively verified that the genes content (in the sense that GC3 is the most prominent which typically are representative of translational bias, compositionalcontentofatranslationallybiasedorganism), such as ribosomal, glycolytic, dehydrogenase, enolase, andmoststrandbiasedgenomesinourcollectionareeither elongation factors, photo-system, heat-shock, and cold- AT3orGC3.Theseobservationsjustifythelimitednumber shock proteins were consistently present in the set. ofsignatureswefound,asitappearsinfigure2.Also,itis worthmentioningthatwedetectedthreegenomeswithGC- D o skewbias,threewithCG-skewbias,andthreewithAT-skew Weak Forms of Translational Bias—Mycobacterium w n bias,butonly1withaTA-skewbias. tuberculosis lo a d e The coupled use of the two criteria detecting trans- d Validation of Signatures and Tendencies lationalbiasallowsustoidentifythosegenomesforwhich fro m Tendencies and signatures obtained by applying the translational bias is weakly present. An example is M. h simple numerical criteria above (see figure 2 and tuberculosis,forwhichonlyoneofthetwostrainsH37Rv ttp s SupplementaryMaterialforthecompletelistofsignatures ((cid:2)zRib¼1.14) and CDC 1551 ((cid:2)zRib¼0.87) is characterized ://a and tendencies for genomes in table 1) are validated on by translational bias, even though both strains have ca known cases, and for some genomes, predictions have comparable codon preferences. Translational bias for this de m been drawn: speciescannotbedetectedbystrengthcriterion,andthisis ic an indicator for weak detection. This observation is .o u p Pseudomonas aeruginosa is GC3 biased but also strand compatible with the findings of de Miranda et al. (2000). .c o biased m /m Tendencies Toward Translational Bias—Rickettsia b Drawn from calculations of CAI values which were e prowazekii /a based on misleading manual selections of sets of most rtic biasedgenes (GrocockandSharp2002;GuptaandGhosh Forthoseorganismswhichonlytendtothethreshold le 2001; Kiewitz and Tu¨mmler 2000),the dominating codon T¼1, i.e., (cid:2)zRib¼1 2 (cid:1) for some small (cid:1) (cid:1) 0, one can -abs biasofP.aeruginosagaveorigintocontroversialopinions check whether ribosomal proteins are present in the set of tra onthebiologyofthisorganism.Thismakesthisgenomea most biased genes or not. R. prowazekii, for instance, has ct/2 goodtestcaseforourcriteria,whicharealsobasedonCAI (cid:2)z ¼0.98andasetofmostbiasedgenesSwhose88%is 2 Rib /3 analysis.InagreementwithGrocockandSharp(2002),we made of ribosomal proteins. We conclude that it has /5 4 detect that P. aeruginosa has a very strong GC3-bias (see a strong tendency toward translational bias, contrary to 7 /1 also Carbone, Zinovyev, and Ke´pe`s [2003]), but also a whathasbeenclaimedbyAndersonandSharp(1996),on 0 7 strongtendencytowardGC-skew,andastrongstrandbias. the basis of a comparison of the amino acid composition 5 9 2 patterns of 21 R. prowazekii proteins with that of 2 Genomes with Strand Bias and No GC-Skew Bias ahomologoussetofproteinsfromEscherichiacoli;there, by g it has been argued that translational selection has been u Strand bias (0.95) is detected for Haemophilus e influenzae, a genome with no GC-skew bias (0.05). Other ineffectiveinthisspecies under thebasethatsynonymous st o codonusagepatternsareroughlysimilarinthe21proteins, n organisms also display strand bias but no GC-skew bias: 1 even though the data set includes genes expected to be 3 Mycoplasma pneumoniae, Buchnera sp., M. genitalium, A expressed at very different levels. A finer analysis of the p oanrgdaCnihslmams yddisipalatryacshtroamndatbisi.asBeasniddejsusPt.aaesrtruogningotsean,doetnhceyr sripbaocseomofaalllprRo.tepirnoswiankeRz.iipprorowteaiknesziiindisicsaetepsartahbaltethfreosmetaolfl ril 201 towardGC-skew:C.pneumoniaeAR39,C.muridarum,C. 9 other proteins by a linear discriminant function with no jejuni.Someothersdisplaybothbiasesasstrongly,likeB. falsepositivenortruenegatives(Sn¼100andSp¼100). burgdorferi (with strand bias at1.89and GC-skew biasat This means that ribosomal proteins occupy a particular 0.77) (Lafay et al. 1999; Carbone, Zinovyev, and Ke´pe`s locationincodonbiasspaceandthatthereisapressureon 2003). codon bias (especially on codons aaa, aga), even though the set of ribosomal proteins has unusually broad dis- Translationally Biased Genomes persion (compared to the typical case where translational In figure 1 (bottom),(cid:2)z and d(wG, wS) values show bias is present). Rib that organisms known to be translationally biased are separable from all others with respect to suitable thresh- Genomes with Empty Signature and Weak Tendencies olds. Some of these organisms have been previously reportedintheliteratureandvalidateourseparation(Gouy A few genomes display an empty codon bias and Gautier 1982; Sharp and Li 1987; Sharp et al. 1988; signature, indicating the absence of strong biases of any 556 Carboneetal. particulartype,asforinstanceHelicobacterpylori(Lafay, sharply determined by preferential codon bias (on Atherton,andSharp2000),butalsoThermosynechococcus Arg1Ile1Gly1Leu),thetransitionbetweenhyperthermo- elongatus, Thermotoga maritima, and the Archaea Meth- philes and thermophiles is less clear and should be anosancina mazei. H. pylori has a weak tendency toward understood as gradual. Our set of 96 organisms confirms GC-skew bias (Grigoriev 2000); T. maritima has a weak the hypothesis of gradual transition discussed in Tekaia, tendency toward GC and GC3 (Zavala et al. 2002); T. Yeramian,andDujon(2002). elongatusandM.mazeihaveweaktendenciestowardGC3 and AT3 biases, respectively. Translational Bias for Hyperthemophiles and Mesophiles Microbial Codon Space and Lifestyle As expected, regions in codon space that collect the most GC3 and AT3 biased genomes, that is the two most A 2-dimensional projection of the 64-dimensional extremeregionsofthegenomesdistributionalongthefirst spaceofEubacteriaandArchaeaorganismsisillustratedin principal PCA axis (interpreted by GC-content), contain D figure1(top),wherethefirstprincipalPCAcomponent(x- o (hyper)thermophiles and mesophiles. It is surprising w axis, explaining 45% of the variance) corresponds to GC n though,toseethattranslationallybiasedorganismscluster lo content and the second principal PCA component (y-axis, a explaining 13% of the variance) corresponds to optimal in two groups localized in distinguished sites of codon ded temperaturegrowth.Anon-linearshapeinthedistribution space, one collecting (hyper)thermophiles and the other fro mesophiles. Knowing that preferred codons and isoaccep- m of points (as viewed best in 3D, not shown), roughly tor tRNA content exhibit a strong positive correlation h resembling a ‘‘horseshoe,’’splits theset oforganisms into (Ikemura 1985; Bulmer 1987; Gouy and Gautier 1982), ttps two well-defined subsets: the top half of the horseshoe is andthattRNAisoacceptorpoolsaffecttherateofpolypep- ://a made by hyperthermophiles which lie ‘‘above’’ thermo- c tide chain elongation (Varenne et al. 1984; Buckingham a philes (all Archaea in table 1 except the mesophilic Halo- de and Grosjean 1986), this means that the set of preferred m bAaqcutieferixumaeospli.c,uas,ndT.thmearthitriemea,hyapnedrtThheremrmoopahnilaicerboabcatcetreiar codons correlated with isoacceptors tRNA leading trans- ic.o tencongensis), and the bottom half by mesophiles (all lfaotriomneaslobpihasilefso.rA(hpypplyeirn)tgheLrDmAop,hwileesobissedrvifefetrheanttathpaonsitthivaet up.co Eubacteria in table 1 except the three hyperthermophilic m speciesindicatedabove,andthemesophilicHalobacterium indicator for translationally biased (hyper)thermophiles /m genomes is agg, that positive indicators for translationally b sp.).Thedivisionsuggestsaseparationofthethreelifestyle e biased mesophiles are gct, ctt, ttc, cag, act, cga, gtt, cgg, /a domains (hyperthermophiles, thermophiles, and meso- cat,tca,tat,cac,gtg,acc,aacandthatnegativeindicators rtic philes)basedoncodonbiasinagreementwiththedivision le observed by Lynn, Singer, and Hickey (2002) for 40 are acg, tcc, agc, aca, ccg, cca, agt, gca, aga. If selection -ab depended merely on some property of mRNAs that is s organisms,andbyKreilandOuzounis(2001)andTekaia, important under conditions of high temperature (Lynn, tra Yeramian,andDujon(2002)for27and56organisms,and Singer, and Hickey 2002), like increased mRNA stability ct/2 based on amino-acids composition. (See also Torres de 2 at high temperature for instance, it is not clear whether /3 FariasandManha˜esBonato[2002]andLobryandChessel /5 translational efficiency could be effectively distinguished 4 [2003].) 7 inhyperthermophiles.Weshowedthattranslationalbiasin /1 0 hyperthermophiles can be clearly detected through codon 7 5 Codon Bias and Optimal Growth Temperature analysis. 92 2 b Tostudycodonbiasdifferencesin(hyper)thermophilic y g andmesophilicgenomes,weusedLDAanddeterminedthat u Aerobic and Anaerobic Respiration e codonscgt,cgcarepositiveindicatorsformesophiles,while st o agg, ata, gga, cta, acg are negative indicators; agg is Organisms sharing the same respiratory character- n 1 apositiveindicatorforthermophiles;cta,agt,ggg,agg,cca, isticstendtogrouptogetherincodonspaceasillustratedin 3 A ctcarepositiveindicatorsforhyperthermophiles,whilecgc, figure 3. Linear discriminant analysis demonstrates that p cat, ggc, tcg are negative indicators. These preferential clusters in the figure are not an artifact of the 2- ril 2 0 codons code for Arg1Ile1Gly1Leu and separate meso- dimensional projection. Indeed, four groups are sharply 1 9 philes from (hyper)thermophiles; it is interesting to notice characterized by distinguished sets of preferred codons thatdistinguishedpreferredcodonscodingforArgseparate with highly significant (positive and negative) separation (hyper)thermophiles(agg)frommesophiles(cgt,cgc).The coefficients: tct, gtt, gcg are positive indicators, and ctt, smallnumber(4)ofthermophilesisdetectedwithpreferred tca,tcg,gtc,agt,aagarenegativeindicatorsforfacultative codonaggcodingforArg;hyperthermophilesareseparated anaerobism; tca, ctt, gac, tac, ttc, cac, aac are positive on preferential codons coding for Leu1Gly1Ser1Arg. indicators and tct, tgc, aga are negative indicators for There is no codon coding for Glu, Tyr or Val that is facultative aerobism; ccg, tta, gcg, cac, aaa, ctc, ctg, agt, preferentialonlyinmesophilesandthermophiles,oronlyin ggg,gga,gtc,cca,ggcarepositiveindicatorsandcgt,tcc, hyperthermophiles: the role played by these three amino ccc, cta, acc, gtg, tcg, cat, gaa are negative indicators for acids (Kreil and Ouzounis 2001; Tekaia, Yeramian, and anaerobism; cgc, gta, gaa, caa, tgc, ccc, cct, gtg are pos- Dujon 2002) in hyperthermophilic proteins remains trans- itiveindicatorsandaaa,ccg,ata,gggarenegativeindica- parent at the nucleotide level. We conclude that while the torsforaerobism.Withinthermophiles,facultativeaerobic division between (hyper)thermophiles and mesophiles is are represented by Pyrobaculum aerophilum, and we

Description:
www.mblab.gla.ac.uk/dictionary). Nucleotide ACTINOBACTERIA. 5 .. space of Eubacteria and Archaea organisms is illustrated in figure 1 (top)
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.