Evidence That Mutation Is Universally Biased towards AT in Bacteria Ruth Hershberg*, Dmitri A. Petrov DepartmentofBiology,StanfordUniversity,Stanford,California,UnitedStatesofAmerica Abstract Mutation is the engine that drives evolution and adaptation forward in that it generates the variation on which natural selection acts. Mutation is a random process that nevertheless occurs according to certain biases. Elucidating mutational biasesandthewaytheyvaryacrossspeciesandwithingenomesiscrucialtounderstandingevolutionandadaptation.Here we demonstrate that clonal pathogens that evolve under severely relaxed selection are uniquely suitable for studying mutational biases in bacteria. We estimate mutational patterns using sequence datasets from five such clonal pathogens belongingtofourdiversebacterialcladesthatspanmostoftherangeofgenomicnucleotidecontent.Wedemonstratethat acrossdifferenttypesofsitesandinallfourcladesmutationisconsistentlybiasedtowardsAT.Thisistrueevenincladesthat havehighgenomicGCcontent.InallstudiedcasesthemutationalbiastowardsATisprimarilyduetothehighrateofC/GtoT/ Atransitions.Theseresultssuggestthatbacterialmutationalbiasesarefarlessvariablethanpreviouslythought.Theyfurther demonstratethatvariationinnucleotidecontentcannotstementirelyfromvariationinmutationalbiasesandthatnatural selectionand/oranaturalselection-likeprocesssuchasbiasedgeneconversionstronglyaffectnucleotidecontent. Citation: Hershberg R, Petrov DA (2010) Evidence That Mutation Is Universally Biased towards AT in Bacteria. PLoS Genet 6(9): e1001115. doi:10.1371/ journal.pgen.1001115 Editor:MichaelW.Nachman,UniversityofArizona,UnitedStatesofAmerica ReceivedFebruary9,2010;AcceptedAugust9,2010;PublishedSeptember9,2010 Copyright:!2010Hershberg,Petrov.Thisisanopen-accessarticledistributedunderthetermsoftheCreativeCommonsAttributionLicense,whichpermits unrestricteduse,distribution,andreproductioninanymedium,providedtheoriginalauthorandsourcearecredited. Funding:RHwasfundedbyaStanfordGenomeTrainingProgram(SGTP)fellowshipworksupportedbytheNationalInstitutesofHealthgrantGM077368to DAP.Thefundershadnoroleinstudydesign,datacollectionandanalysis,decisiontopublish,orpreparationofthemanuscript. CompetingInterests:Theauthorshavedeclaredthatnocompetinginterestsexist. *E-mail:[email protected] Introduction bacterial mutational biases must to be extremely variable to be able to generate the extreme variation observed in bacterial Mutationgeneratesthevariabilityonwhichnaturalselectionacts. nucleotide content. Mutationisnotanentirelystochasticprocess,asitactsaccordingto Asecondpossibilityisthatitisnotvariationinmutationalbiases certaindeterministicbiases.Becauseofthis,biasesintheoutcomeof thatleadstovariationinnucleotidecontent,butrathervariationin theevolutionaryprocessresultnotonlyfromselectionbutalsofrom the relative probabilities of fixation of A/T to G/C and G/C to thebiasesofmutation.Inordertounderstandevolutionitistherefore A/T mutations [1,3,8]. When considering changes to nucleotide necessarytoelucidatemutationalbiasesandthewaysinwhichthese content, differences in fixation probabilities can stem both from biasesthemselveschangeinevolution. differences in the strength and direction of natural selection and Nucleotide content variation is much more pronounced in differencesintheratesofbiasedgeneconversion(BGC)[1,9,10]. bacteriacomparedtomulti-cellulareukaryotes[1].GCcontentsin Natural selection affects the probability of fixation of an allele bacteria vary from less than 25% to over 75% [1–3]. Related based on the alleles fitness advantage or disadvantage and the bacteriaevenfromrelativelybroadphylogeneticgroupingstendto effectivepopulationsize(N)oftheorganisminquestion.Similarly e show similar genomic nucleotide content [3]. For example BGC is also dependent on N and on the advantage or e bacterial genomes from the order Bacillales tend to be GC-poor, disadvantage the allele has. Here however, this advantage or from the order Enterobacteriales to have intermediate GC disadvantage is not in fitness but rather in the increased or contents, and from the phylum Actinobacteria to be GC-rich decreased probability of the allele to be passed on to the next (Figure1A).Inaddition,GCcontentvaluesmeasuredatdifferent generation through gene conversion [1,10]. This increase or functional site categories (intergenic, synonymous, and non- decrease is determined by recombination rates and by the synonymous) show highly correlated patterns of variation across conversion bias, which has been shown in many eukaryotes to bacteria [4] (Figure 1B). These observations suggest that forces beinfavorofGCnucleotidescomparedtoATnucleotides[1,9]. determining GC content in bacteria operate both genome-wide A recent study that showed that in Escherichia coli regions of low and consistently over long periods of time. One possibility isthat recombination tend to be more AT rich demonstrates that BGC the main force driving nucleotide content variation in bacteria is mayaffectnucleotidecontentinbacteriainasimilarmanner[11]. mutation. This possibility has often been assumed true (for Inordertogaininsightintomutationalbiasesitisnecessaryto example see [2,4–7]). Under this assumption, clades that are GC investigate the results of mutation in isolation from those of rich are clades in which mutation has been consistently biased selectionandBGC.Wheneffectivepopulationsizesaresmall,the towards GC while clades that are AT rich are clades in which efficacy of both natural selection and BGC is severely reduced mutation has been consistently biased towards AT. If true, relativetostochasticprocessesandthereforesequenceevolutionis PLoSGenetics | www.plosgenetics.org 1 September2010 | Volume 6 | Issue 9 | e1001115 MutationIsBiasedtowardsATacrossBacteria differences between them can be viewed as polymorphisms. In Author Summary addition, the lifestyle and clonality of such pathogens is likely to Naturalselectionsortsthroughthevariabilitygeneratedby leadtosmallN,furtherreducingefficacyofnaturalselection[25]. e mutation and biases evolution toward fitter outcomes. ThepatternsofSNPsamongcloselyrelatedstrainsofsuchclonal However,becausemutationisitselfnotentirelyrandomit pathogensshouldthusreflectdirectlythepredominantmutational can also bias the direction of evolution independently of biases. selection. For instance, it is often assumed that the Here we estimate mutational biases by analyzing SNPs extreme variation observed in nucleotide content among extracted from large sequence datasets of five lineages of clonal bacteria(from,20%to,80%GC)ispredominantlydriven pathogens (including MTBC) from four broad clades of bacteria by extreme differences in mutational biases towards or thatspanvirtuallythewholebacterialphylogenyandtherangeof awayfromGC.Here,weshowthatbacteriallineagesthat bacterial nucleotide contents (Figure 1A): Bacillales (AT rich), recently developed clonal, pathogenic lifestyles evolve Enterobacteriales (intermediate GC), Actinobacteria (high GC), under weak selection and that polymorphisms in these and Burkholderiales (high GC). We find that in all lineages bacteriacanbeusedasafairproxyformutationalspectra. mutationisbiasedtowardsAT,andthatG/CtoA/Ttransitions We analyze large sequence datasets from five clonal are always predominant. Previous studies indicate that mutation pathogens in four diverse bacterial clades spanning most oftherangeofgenomicnucleotidecontent.Wefindthat, may be universally AT-biased in eukaryotes [21,26–31]. Our surprisingly, mutationisAT-biased ineverycaseto avery results together with additional studies that have focused on similar degree and in each case it is dominated by Enterobacteriales (E. coli, Shigella, and Salmonella typhimurium) transitions from C/G to T/A. This demonstrates that [32,33]demonstratethat mutationmaybeuniversally AT-biased mutational biases are far les variable than previously in bacteria as well. These findings contradict the long-held view assumedandthatvariationinbacterialnucleotidecontent that mutational biases are the main contributors to variation in is not due entirely to mutational biases. Rather natural bacterial nucleotide content and are therefore highly variable selection or a selection like process such as biased gene among bacteria. Rather they suggest that nucleotide content in conversion strongly affectnucleotide content inbacteria. bacteria is strongly affected by variation in the relative rates of fixation of AT to GC and GC to AT mutations and that mutational biasesare farlessvariable thanpreviously thought. affected strongly by mutational biases. Mutation-accumulation experiments artificially reduce N of evolving laboratory cultures Results e [12]andcanthusbeusedtoassessmutationalbiasesinculturable bacteria. Similarly, reporter constructs have also been used to Sequence data from lineages of clonal pathogens are estimatemutationalbiases[13,14].However,withoutknowingthe extremely suited to the study of bacterial mutational relativeamountoftimebacteriaspendindifferentgrowthphases biases (logarithmic vs. stationary) and given that mutational rates and We focused on five lineages of clonal pathogens from four patternsvarybetweengrowthphases[15,16]itcouldbedifficultto diversebacterialclades(Table1).Thefivelineagesweinvestigated estimatethetruemutationalbiasesoperatinginnatureusingsuch areuniqueintheirsuitabilityforthistypeofanalysisbecausethey experimental approaches. An additional approach is to examine provide us with sufficient amounts of available sequence data for nucleotidesubstitutionsatsitesthatareexpectednottobesubject sufficiently closely related strains in which we can demonstrate a toselectionduetoproteinfunctionality,suchaspseudogenes[17], genome-wide relaxation in the efficacy of natural selection. The or fourfold degenerate sites [18]. This approach is also chosen strains are indeed very closely related with each lineage problematic because while pseudogenes and fourfold degenerate exhibiting less than 0.5 pairwise differences per gene (Table 1). sites are expected to be under no, or low selection for protein However, because of the availability of multiple whole genome functionality,theyshouldbesubjecttothesamelevelsofselection sequencesineachlineagethetotalnumberofSNPsissubstantial on nucleotide content astherest ofthegenome. (Table1),rangingfrom165to1877.Inaddition,theselineagesare A good way to estimate mutational biases is to analyze the thought to be clonal [34]. Thus, horizontal gene transfer should patternsofsinglenucleotidepolymorphisms(SNPs)withinspecies. occur only rarely, if at all and should not strongly influence our Population genetic studies have shown that natural selection and ability to infer the ancestral and derived states of mutations. otherselection-likeprocessesarelessefficientinaffectingpatterns Finally,theinferenceofSNPsfromsuchcloselyrelatedsequences of nucleotide polymorphisms among very closely related strains is almost trivial, as alignment programs do much better when compared to nucleotide differences between distantly related sequences are highly similar. Therefore, we expect to have no strains or species [19,20]. Thus SNPs should better reflect the biasesintroduced through misalignment of sequences. mutational patterns compared to substitutions between species. We assessed whether the patterns of SNPs in these data are TheanalysisofSNPshasbeenusedtoinvestigatethemutational indeedweaklyaffectedbynaturalselectionbyestimatingtheratio biases of a number of AT-rich eukaryotic genomes, such as of non-synonymous and synonymous differences per non-synon- Drosophila[21–23].However,usingthismethodologyinbacteria ymous and synonymous site (dN/dS) [35,36] across all alignable has been problematic due to a severe blurriness in species proteinswithineachdataset(MaterialsandMethods).Ifselection boundariesamongprokaryotes[24].Asaresultofsuchblurriness is strong, dN/dS should be much smaller than 1 as it would quiteoftenstrainssharinga‘‘species’’name(suchasE.coli)arein efficiently remove most non-synonymous mutations [35,36]. For factquitediverged.Whileitisdifficulttodefinespeciesinbacteria, example, comparisons of E. coli strains yields dN/dS values of severely reduced selection has been observed among very closely approximately 0.05 [37]. In contrast, for MTBC where multiple related bacterial strains of some strictly clonalpathogens, such as linesofevidencesuggestthatnaturalselectionisseverelyreduced strains belonging to the Mycobacterium tuberculosis cluster (MTBC) [25],dN/dSgoesupto0.59.Inourdataset,dN/dSvaluesforthe [25]. This can be explained by the fact that strains belonging to other four lineages range between 0.45 and 0.64 (Table 1, such lineages of pathogens are extremely closely related and thus Materials and Methods). This suggests that selection is indeed these lineages may be good proxies of ‘‘species’’ and nucleotide relaxedinagenome-widemannerinthesegenomesandthusthat PLoSGenetics | www.plosgenetics.org 2 September2010 | Volume 6 | Issue 9 | e1001115 MutationIsBiasedtowardsATacrossBacteria PLoSGenetics | www.plosgenetics.org 3 September2010 | Volume 6 | Issue 9 | e1001115 MutationIsBiasedtowardsATacrossBacteria Figure1.PhylogeneticandgenomicvariationinGCcontent.(A)Thephylogenyofthefourbroadcladesexaminedinthisstudy.Itwasbuilt usingtheiTOLwebpage(http://itol.embl.de/).AverageGCcontentofthedifferentbroadcladesareindicatedonthemargins.Smallbluetriangles representthefivelineagesofclonalpathogensusedintheanalysis.(B)GenomewideobservedGCcontentofsynonymousandnon-synonymous sitescorrelatewiththeintergenicGCcontentacrossbacterialgenomes. doi:10.1371/journal.pgen.1001115.g001 thepatternofSNPsshouldbereflectiveofthemutationalbiasesin content, the predominant mutation is G/C to A/T transition these lineages. (Figure 2). Relaxedselectionoverlongevolutionarytimescalescanleadto It is important to remember that relaxation of selection in the extremegenomereductionandthelossofmanyrepairpathways, studied lineages is fairly recent and that nucleotide content is a which could affect mutational patterns [1,38]. The pathogens we slowly evolving trait. Therefore, if driven by selection or BGC, useinthisstudyhavesufferedonlyashort-termrelaxationinthe nucleotide content should not have had time to reach a new efficiencyofselectionandthereisnoindicationthatanyofthem mutational equilibrium. However, if nucleotide content is driven have lost repair functions. To further substantiate that these predominantly by mutation, and selection and BGC do not pathogensarenotlikelytohavesufferedlossofrepairfunctionswe stronglyaffectnucleotidecontentthegenomicnucleotidecontents examined whether any of the repair genes annotated in close shouldalreadybeatthemutationalequilibrium.Wetestwhether relatives of the examined pathogens that are not evolving under nucleotidecontent isatequilibrium bycomparingthenumber of inefficient selection, have been lost in the pathogens. We found GCRATandATRGCchangesobservedineachdataset.Under thatinB.anthracis,B.mallei,S.typhiandY.pestistherehasbeenno equilibrium these numbers will be equal. The results of such lossofrepairgenesandthatinallcasestherepairgenesarehighly comparisons (Table 2) clearly show that the lineages with similar to those found in the outgroups (Materials and Methods, intermediate(Salmonellatyphi&Yersiniapestis)andhigh(Burkholderia TableS1).InthecaseofMTBC,sincethecloselyrelatedoutgroup mallei & MTBC) nucleotide contents are currently far from strainM.canettiiisnotfullysequencedwecouldonlycomparethe equilibriumandthatGCRATchangesaremuchmorefrequent. genespresentinfullysequencedstrainsofMTBCtothosepresent These results are statistically significant for all but the intergenic inamoredistantlyrelatedoutgroup,M.marinum.Allbutoneofthe datasetofB.mallei,inwhichasmallnumberofSNPsleadstovery repairgenesfoundinM.marinumarealsofoundinMTBC(Table low statistical power (Table 2). S1). The M. marinum gene that is not found in MTBC is an un- Theaboveresultsindicatethatundercontinuedrelaxedselection namedgeneofunclearfunction.Theseresultstogetherwithalack the lineages with intermediate or high GC content will evolve to of previous evidence for loss of repair functions in these well becomemoreATrich.Itiseasytocalculatetheexpectedequilibrium studiedpathogensmakesitunlikelythatthesepathogenshavelost GCcontentbasedonthemutationalrateswefound(GC ,Materials eq repair functions. It is even less likely that all of them suffered andMethods)[1,8,27].Suchacalculationshowsthatforalllineages similar losses of repair functions. GC is lower than 50% (Figure 3, Table 3). In other words, eq mutational biases by themselves should lead to AT-rich genomes. Mutation is AT-biased independently of the current Furthermore, if nucleotide content is primarily determined by the nucleotide content of bacterial clades mutational biases of the genome, GC should approximate the eq We polarized the SNPs (Materials and Methods) and observed GC content genomewide, However, for the four clonal classified changes into six possible types of mutations (G/C to pathogen lineages with either intermediate or high GC contents, A/T, G/C to T/A, G/C to C/G, A/T to G/C, A/T to C/G, GC isalwayssignificantlylowerthanthecurrentgenomewideGC eq andA/TtoT/A).Therelativerateofeach ofthesixmutation at all types of sites (Figure 3, Table 3). Finally, no significant types was calculated after normalizing for the current GC correlation was observed between GC and current GC across all eq contentatthestudiedpositions(MaterialsandMethods).Forall sitecategoriesinalllineages(r=0.09,Spearmancorrelation,n=14, five lineages, irrespective of current genomewide nucleotide P#1. Note that the 14 data points examined are not entirely Table1. Clonalpathogen lineages analyzedin study. Maxpairwise Clonalpathogen SourceofSNPs Outgroup variabilitya dN/dS #SNPs Non- Intergenic synonymous Synonymous Bacillusanthracis Alignmentsof18fullyand B.thuringiensis 0.3 0.58 322 239 112 partiallysequencedstrains Salmonellatyphi SNPsprovidedbyHoltetal.[52] None 0.4 0.45 260 961 656 basedonthesequencingof19 (phylogenetictree) strains Yersiniapestis Alignmentsof7fullysequenced Y.pseudotuberculosis 0.2 0.64 118 345 162 strains Burkholderiamallei Alignmentsof11fullyand B.thailandensis 0.1 0.47 44 70 51 partiallysequencedstrains MTBC Alignmentof89genessequenced M.canettii 0.5 0.59 NA 226 136 in107strains aTheaveragepairwisediversitypergeneofthetwomostdivergedstrainswithinthelineage. doi:10.1371/journal.pgen.1001115.t001 PLoSGenetics | www.plosgenetics.org 4 September2010 | Volume 6 | Issue 9 | e1001115 MutationIsBiasedtowardsATacrossBacteria Figure2.Relativeratesofthesixnucleotidepairmutations.ThemostcommonmutationisalwaysG/CtoA/Ttransitions.Theratesare normalizedfortheunequalnucleotidecontentofthefivedifferentlineages(MaterialsandMethods).(A)non-synonymousSNPs.(B)synonymous SNPs. doi:10.1371/journal.pgen.1001115.g002 independent.ThecalculatedP-valueshouldthereforebetakenwitha spanalargeportionofbacterialphylogeny(Figure1A)mutationis grainofsalt,andisonlyprovidedtodemonstratethatGC doesnot consistentlybiasedtowardsAT.Furthermore,theseresultssuggest eq appeartobecorrelatedtocurrentGC). that in bacteria that are GC rich or have intermediate GC In order to show that these results are not an artifact of contents, it is an elevated fixation of GC-enriching mutations sequencing errors, we recalculated GC after removing all cases ratherthanachangeinmutationbiasthatdrivestheelevatedGC eq in which the derived allele appears only in a single genome content. (singletons). While this reduces the number of SNPs per dataset Differencesbetweengenomesthataremoredistantlyrelatedto andincreasestheerroroftheestimate,GC valuesremainlower each other should reflect the effects of natural selection and/or eq than the current GC for all datasets from lineages with BGC better. The GC calculated based on such differences eq intermediate or high GC contents and the results remain should be more similar to the observed GC content than that statistically significant in all the datasets except for the B. mallei calculated based on SNPs from more closely related strains. To intergenicsites(TableS2).Theseresultstogethershowthatinfive examine these predictions we analyzed two additional datasets in lineages of clonal pathogens, belonging to four broad clades that which sequences are still closely related enough to create reliable PLoSGenetics | www.plosgenetics.org 5 September2010 | Volume 6 | Issue 9 | e1001115 MutationIsBiasedtowardsATacrossBacteria Table2. Nucleotidecontentsofclonal pathogenswith intermediate andhigh GCcontentsare far from equilibrium. Clonalpathogen CurrentGCcontent Sites #GCRATa #ATRGCa MTBC High Synonymous 103(84,123) 18(10,26) MTBC High Non-synonymous 127(105,151) 58(43,73) B.mallei High Synonymous 45(33,59) 2(0,5) B.mallei High Non-synonymous 57(42,73) 7(2,12) B.mallei High Intergenic 27(17,38) 12(6,19) Y.pestis Intermediate Synonymous 116(95,138) 37(26,48) Y.pestis Intermediate Non-synonymous 230(201,259) 79(63,97) Y.pestis Intermediate Intergenic 69(52,85) 36(25,47) B.anthracis Low Synonymous 44(32,57) 64(49,79) B.anthracis Low Non-synonymous 136(114,158) 75(58,93) B.anthracis Low Intergenic 141(119,166) 151(128,175) S.typhi Intermediate Synonymous 570(524,619) 79(62,97) S.typhi Intermediate Non-synonymous 707(654,760) 220(192,248) S.typhi Intermediate Intergenic 189(164,215) 65(50,81) a95%Confidenceintervalsappearinparenthesis.Numbersappearinboldifthereisastatisticallysignificant(p,0.05)differencebetweennumberofGCRATand numberofATRGCchanges. doi:10.1371/journal.pgen.1001115.t002 multiplesequencealignmentsforarelativelylargenumberoftheir thatintheseorganismsnucleotidecontentwillbestronglyaffectedby genesandintergenicregions,yetshowhigherdivergencethanthe mutation.Thegenomesofobligatoryintracellularbacteriatendtobe fivelineagesinTable1(Table4).Thefirstdatasetwascreatedby ATrich[38],indicatingthatmutationmaybebiasedtowardsATin aligning sequences from Y. pestis and Yersinia pseudotuberculosis and these organisms. However, it is unknown whether the mutational examiningthedifferencesbetweenthesetwolineages.Thesecond biases of obligate intracellular bacteria reflect those of their clade dataset was created by aligning sequences from 10 strains of membersthatarenotlivingthislifestyle,sincetheseorganismslack Burkholderia pseudomallei. These two datasets were selected because many of the repair genes which are present in their other clade theycanbenaturallypairedtotwoofthefivedatasetsinTable1 membersthatarenotevolvingundersuchrelaxedselection[38].Itis (B. mallei and Y. pestis). dN/dS values for these comparisons are unclear how the absence of repair pathways affects the mutational lowerthanintheotherfivedatasets(x2test,P,0.00001,Table4), biases of these bacteria. Lind and Anderson [33] have carried out suggestingthattheeffectsofselectionareindeedmoreevidentin mutation accumulation experiments in Salmonella typhimurium strains thesedata.Aspredicted,theGCeqvaluesderivedfromdifferences lackingthemajorDNArepairsystemsinvolvedinrepairingcommon between Y. pestis and Y. pseudotuberculosis and from B. pseudomallei spontaneous mutations caused by deaminated and oxidized DNA comparisons are more similar to the current GC content at all bases. They found that in such strains of S. typhimurium mutation is types of sites compared to GCeq calculated from the paired stronglyAT-biasedbutthatunlikewhatwefindforourpathogens, datasets of clonal pathogens (Figure 4). These results are whentheserepairpathwaysareabsenttransversionsbecomemuch statistically significant for the non-synonymous and synonymous morefrequentthantransitions[33].Thequestionhoweverremains sitecomparisons(Figure4,Table4).Forintergeniccomparisonsa ofwhetherlossofrepairpathwayschangestheATbiasofmutationin small number of identified B. mallei SNPs, and Y. pestis/Y. obligateintracellularbacteria. pseudotuberculosis differences makes it difficult to demonstrate It is also unclear whether the nucleotide contents of obligate statistical significance. Furthermore, the GCeq values calculated intracellular bacteria are currently at mutational equilibrium. based on the more diverged datasets are significantly correlated Even though these organisms have been subject to prolonged, with the current genomewide GC content in Y.pestis and B. severe relaxations ofselection itispossiblethat theyhavenot yet pseudomallei (r=0.99, Spearman correlation, n=6, P#0.03). It is reached nucleotide content equilibrium. It is also theoretically important to note that the GC values derived from differences eq possible that the strength of selection in favor of GC nucleotides betweenY.pestisandY.pseudotuberculosisandfromB.pseudomalleiare increasesasgenomesbecomemoreATrich(afromofsynergistic stilllowerthancurrentGCcontentforallcomparisons(Figure4). epistasis[26,39,40]).Ifthisisthecaseitispossiblethatnucleotide This indicates that even a relatively slight reduction in selection content in obligate intracellular bacteria may be at a new canleadtoacertainreductioninGCcontent.Asimilarresultwas selection-mutation equilibrium. recently found for E. coli and Shigella, where a relatively slight The questions of whether the mutational biases of obligate reductionintheefficiencyofselection[37]wasshowntoleadtoan intracellular bacteria resemble those of their other clade members excess of GCRAT changesinShigella compared to E. coli[32]. andofwhethertheirnucleotidecontentisatmutationalequilibrium can be addressed by comparing our estimates of GC to the GC eq Nucleotide content of obligate intracellular bacteria content of obligatory intracellular bacteria from the same clades. matches those predicted based on estimates of Among the four analyzed clades, two (Enterobacteriales, and mutational biases Actinobacteria)includesequencedobligatoryintracellularbacteria. Obligate intracellular bacteria are known to be evolving under Intriguingly, the GC contents of the obligate intracellular bacteria extremely prolonged relaxed selection [38]. It is therefore possible withinthesetwocladesaresimilartothevaluesofGC whichwe eq PLoSGenetics | www.plosgenetics.org 6 September2010 | Volume 6 | Issue 9 | e1001115 MutationIsBiasedtowardsATacrossBacteria Figure3.EquilibriumGCcontent(GC )andtheobservedGCcontentinthefivestudiedclonalpathogenlineages.GC iscalculated eq r eq from the estimated rates AT to GC and GC to AT mutations (rATRGC and rGCRAT) as GCeq~r ATz?rGC . GCeq values are AT?GC GC?AT significantly lower than the observed GC content at all site categories (intergenic, synonymous and non-synonymous) and for all four lineages of clonal pathogens with either intermediate or high GC contents. MTBC intergenic SNPs were not available for analysis.Errorbarsdepict95%confidenceintervalsforGC .NocorrelationisobservedbetweenGC andcurrentGCfortheclonal eq eq pathogen lineages (P#1). doi:10.1371/journal.pgen.1001115.g003 calculatedusingclonalpathogenSNPdatafromtheircorresponding suggest that the GC content of obligate intracellular bacteria clades: GC values calculated based on SNPs from the Enter- correspondstowhatwewouldexpecttofindintheirnon-obligate eq obacteriales S. typhi, and Y. pestis were ,22%, and ,25% intracellularclademembersatequilibriumifnucleotidecontenthad respectively. While other members of the order Enterobacteriales been determined by mutation alone. This anecdotal evidence haveacurrentGCcontentof43–57%,obligateintracellularbacteria suggeststhateventhoughobligateintracellularbacteriadotendto withinthisorderhaveGCcontentsof23–30%.Withinthephylum lose many repair functions the extent of their mutational AT bias ActinobacteriaallgenomeshaveGCcontentofover50%exceptfor resemblesthatoftheirotherclademembers.Additionally,itappears theonlysequencedobligateintracellularbacteria,Tropherymawhipplei that nucleotide contents of these bacteria are close to mutational that has a GC content of 46%. This is very close to the GC equilibrium. The observation that obligate intracellular bacteria eq calculated for the Actinobacteria MTBC (,42%). These results across additional clades tend to have high AT contents relative to PLoSGenetics | www.plosgenetics.org 7 September2010 | Volume 6 | Issue 9 | e1001115 MutationIsBiasedtowardsATacrossBacteria Table3. Comparisonsofcurrent GCcontent and GC forfive clonal pathogenlineages. eq Current GCeq CurrentGC GCeq Current GCeq Clonalpathogen GCintergenic intergenica non-synonymous non-synonymousa GCsynonymous synonymousa Bacillusanthracis 32.5 34 39.5 26.5 23.4 30.8 (29.2,38.9) (21.3,32.2) (23.8,40) Salmonellatyphi 46.1 22.8 50.1 23.8 62.3 18.6 (17.8.27.8) (21.2,26.6) (15.7,22.8) Yersiniapestis 42.5 27.8 48.1 24.1 51.7 25.4 (20.2,36.3) (19.8,29) (18.9,32.7) Burkholderiamallei 67.8 48.3 61.1 16.2 90.5 29.9 (29,63.3) (5.3,26.6) (0,54) MTBC 62.8 Missingdata 60.8 41.5 80.2 41.6 (33.8,48.5) (29.1,52.3) a95%Confidenceintervalsappearinparenthesis.BoldfontindicatesGCeqvaluesaresignificantlydifferentfromcorrespondingcurrentGCcontents. doi:10.1371/journal.pgen.1001115.t003 their non-obligate intracellular clade members, and to have GC are being purged by selection. Yet it is still possible that our contents lower than 50% (Table S3, Figure S1) may therefore estimatesofmutationalbiasesaresomewhataffectedbytheresidual supportthegeneralityofAT-biasedmutationinbacteriabeyondthe effectsofnaturalselectiononstronglydeleteriousmutations.Ifsuch fourphylogeneticallydiversebroadcladesexaminedinthisstudy. residualeffectsofselectionaffectourresultstheyarelikelytomake ourestimatesoftheextentoftheATbiassomewhatconservative. Discussion This is demonstrated by our finding that within a given clade a stronger bias towards AT is observed in clonal pathogens where Differencesinthepatternsofmutationhavebeenassumedtobe selection is severely relaxed compared to closely related lineages the most likely explanation for the vast variation observed in subjecttomoreefficientselection.Thisbeingsaid,ourobservation nucleotidecontentacrossbacteria(forexamplesee[2,4–7]).Given thatwithinabroadcladeobligateintracellularbacteriaevolvetoa that bacterial nucleotide content varies between .75% GC and GCcontent that matchesverycloselytheonepredictedbasedon .75%AT,thisassumptionimpliesthatpointmutationbiasesare ourestimatesofmutationsuggeststhatourestimatesoftheextentof extremely variable among bacteria. Our results demonstrate that ATbiasarequitereliable. mutationalbiasesareinfactverysimilaracrossbacteria.Mutation Inanadditionalstudy,whichispublishedback-to-backwithour appears to be dominated by C/G to T/A transitions and is AT- biased in every studied instance, even in bacteria with high studyinthisissueofPLoSGenetics,Hildebrandetal.investigated genomic GC contents. At the same time, it is important to note mutationalpatterns byexaminingchangesinfourfold degenerate that mutational biases need not be entirely constant in different codons in a large number of bacterial lineages with divergence bacteria.Infact,ourresultsdemonstratethatintheActinobacteria under 10% at these sites [41] and argue for the AT bias of mutationappearstobelessstronglybiasedtowardsATthaninthe mutationinGCrichbacteria.TheargumentofHildebrandetal.is other clades examined. complicatedbytwofactors.First,naturalselectioninsuchlineages Due to a severe relaxation in natural selection, the recent canremainstrongasexemplifiedbyE.coli[37]andB.pseudomallei evolutionoftheclonalpathogensinvestigatedinthisstudyshouldbe (studiedhere)andmuchstrongerthaninthefiveclonalpathogen predominantlydrivenbymutation.However,itisimportanttonote strains studied here. Second, theinference ofmutational patterns thatnaturalselectionisnotentirelyabsentinthesepathogensand from fourfold degenerate sites alone is complicated by a strong that mutations of severedeleterious effectwill still be removedby correlationoftheGCcontentofselectivelyfavoredcodonsatthe selection.InMTBCwehaveevidencethatselectionisrelaxedtothe leveloftranslationandtheGCcontentofgenomes[42],makingit pointthatitdoesnotdistinguishbetweenmutationsthatalteramino possible that some of the detected effects are related to natural acidsthatareconservedacrossallMycobacteriaandarepresumed selectionattheleveloftranslation.Bothofthesefactorsimplythat to be under strong constraint and those that are variable in themutationalpatternsinferredbyHildebrandetal.aremuchless Mycobacteria and are likely to be under much weaker constraint precisethanthosepresentedhere.Nevertheless,thetotalityofthe [25],indicatingthatonlyextremelystronglydeleteriousmutations evidence in Hildebrand et al. is consistent with the generality of Table4. Summaryof resultsfortwo more distantlyrelated lineages. Maxpairwise CurrentGC GCeq CurrentGCnon- GCeqnon- CurrentGC GCeq Dataset variabilitya dN/dS intergenic intergenicb synnonymous synonymousb synonymous synonymousb Y.pestis/ 4.9 0.1 42.5 33 48.1 36.3 51.7 47 pseudotuberculosis (11,59.7) (29,44.3) (42.8,51.2) B.pseudomallei 3.3 0.13 65.3 49.9 61.2 47 90 58.9 (41.1,58) (42.8,50.7) (55.5,61.7) aTheaveragepairwisediversitypergeneofthetwomostdivergedstrainswithinthelineage. b95%Confidenceintervalsappearinparenthesis.BoldfontindicatesGCeqvaluesaresignificantlydifferentfromcorrespondingcurrentGCcontents. doi:10.1371/journal.pgen.1001115.t004 PLoSGenetics | www.plosgenetics.org 8 September2010 | Volume 6 | Issue 9 | e1001115 MutationIsBiasedtowardsATacrossBacteria Figure 4. Comparison between GC calculated using data from clonal pathogens and GC calculated using data from more eq eq divergedlineages.GC estimatedusingmoredivergedlineagesisalwaysmoresimilartoandsignificantlycorrelatedwithcurrentGCcontent eq values(P=0.03). doi:10.1371/journal.pgen.1001115.g004 AT-biasedmutationacrossbacteria,especiallygiventheirfocuson muchmorefrequentthantransitions[33].Incontrast,inallofthe a much largernumber of bacterial lineagesthan presented here. fivepathogensweexaminedmutationwasstronglybiasedtowards A prolonged severe relaxation of selection in obligate intracel- transitions, rather than transversions. We cannot show conclu- lular bacteria has led to massive loss of repair genes in these sively that no repair functions have been lost in any of these five bacteria [38]. In the clonal pathogens studied here relaxation of pathogens.However,webelievethatthelackofevidenceforany selection occurred much more recently and there is no evidence suchloss,theconsistencyofthemutationalbiasesobservedacross forlossofrepairfunctionsinthesegenomes.Infact,weshowthat allfivedatasetsexamined,andthebiastowardstransitionsrather noneofthesepathogenshavelostanyoftherepairgenesencoded than transversions, make it reasonable to assume that our results bytheircloselyrelatedoutgroupsthatareevolvingunderefficient are unaffected by asignificant lossinrepair functions. selection. Additionally, a previous study that investigated the Weshowthatwithinbroadcladesobligateintracellularbacteria patternofmutationinstrainsofS.typhimuriumwithdeficientrepair evolvetoaGCcontentthatmatchesverycloselytheGCcontent functions has shown that in such strains transversions become predictedatequilibriumbasedonourestimatesofmutationrates PLoSGenetics | www.plosgenetics.org 9 September2010 | Volume 6 | Issue 9 | e1001115 MutationIsBiasedtowardsATacrossBacteria of clonal pathogens belonging to the same clade. This together It is possible to use the mutational rates we calculated to with the long-standing observation that the vast majority of estimate the strength with which selection, and/or BGC were obligate intracellular bacteria tend to have extremely AT rich acting on nucleotide content in the examined clonal pathogens genomes [38] supports the generality of AT-biased mutation in prior to their recent relaxations of selection. Such calculations bacteria. Itishoweverimportanttonotetherecentidentification (Table S4) show that such selection is always weak (s#1/N), e ofanobligateintracellularbacterium,CandidatusHodgkiniacicadicola which would be expected considering GC content is always thathasaGC-richgenome(58.4%GC)[43].Itispossiblethatthis intermediate. This demonstrates that the selective or BGC bacterium constitutes an exception to the universality of AT- advantage of GC over AT nucleotides need not be high in order biasedmutationandthatmutationinthisorganismisGCbiased. toexplainthevastvariationinGCcontentsobservedinbacteria. However, it is also possible that this is not the case and that this However, it is important to note that such calculations make a bacterium is GC-rich for other reasons (such as due to natural number of assumptions that may not be reasonable. One such selection or BGC). Further studies will be needed to determine assumption is that selection acts uniformly across sites and that whether Candidatus Hodgkinia cicadicola indeed has exceptional there is no synergistic epistasis (i.e. that the intensity of selection mutational biases. does not change with changes in nucleotide content). An By demonstrating that variation in nucleotide content in additional assumption is that there are no competing selective bacteriaisnotgenerallydrivenbydifferencesinmutationalbiases forces.Thissecondassumptionisclearlyincorrectwhenitcomes we demonstrate that natural selection or another selection-like tonon-synonymoussites.Evenifselectiononnucleotidecontentin process such as BGC must play a dominant role in nucleotide these sites were strong enough to drive them to use only GC contentvariation,particularlyindrivingintermediatetohighGC nucleotides it is highly likely that selection for protein function contentinmanybacteriallineages.Atthispointitisunclearhow would not allow this to happen. It is therefore unclear whether much of this variation is driven by selection and how much by selection on nucleotide content isin factalways weak. BGC. In this study we demonstrate the great utility of clonal If natural selection plays a strong role in determining GC pathogens for the study of mutational biases in bacteria. We content it suggests that in many bacteria there are no truly investigated five such clonal pathogens from four very diverse neutrally evolving sites. The nature of such selection remains clades. We showed that in every studied case and across all site obscure.BecauseGCcontentcorrelatesstronglyacrosscodingand categories point mutation is consistently biased towards AT and noncoding sites genome-wide, natural selection acting on GC that the most frequent mutations are always G/CRA/T content probably relates to genome-wide functions such as transitions, thus demonstrating that the biases of point mutation replication or DNA maintenance and is less likely to be related aremuchlessvariablethanwaspreviouslyassumed.Byidentifying to gene expression. Previous studies attempted to associate GC additional bacteria evolving under strongly relaxed selection and content with environmental factors such as growth temperature conductingdeepsequencingofsuchbacteriaitshouldbepossible [44,45], exposure to UV [8], oxygen requirements [46], and the to address additional questions regarding mutational patterns of ability to fix nitrogen [47]. While these studies were thought by bacteria, including the variability in the rates of insertions and sometobeinconclusive[3,44,48–51]theyprovidethebestcurrent deletions across bacteria and mutation clustering along bacterial explanations for the possible involvement of selection in chromosomes. determining nucleotide content. However, considering that Recent studies have shown that mutation appears to be bacteria belonging to such broad clades as Actinobacteria have universally AT-biased in eukaryotes [21,26–31]. Our results similargenomicnucleotidecontentseventhoughtheyareexposed demonstrating that this may also be the case in prokaryotes to different environments it becomes tempting to speculate that thereforeshowthatmutationmayinfactbeAT-biasedinallliving environmental variables may not be the only underlying organisms(althoughitisimportanttonotethatwedonotyethave determinants for the natural selection acting on nucleotide good estimates of the mutational biases of Archea). Not only is content. It is possible that selection on nucleotide content is also mutation AT-biased in all instances studied, but the specific driven by more intra-organismal factors that can affect entire pattern of mutation is always consistent. The most common clades irrespective of environment. Examples of such factors can mutationsarealwaysG/CtoA/Ttransitions.Theseresultsmake betheabilityofthereplicationmachinerytoworkbetteronGCor ittemptingtospeculatethatthepredominantmutationsaresimply AT rich sequences, DNA packaging, defense against phages or the result of the lability of cytosine to deamination, and that this creatingbarriersforhorizontalgenetransfer.Morestudiesneedto pattern shows through despite possible variability in DNA be carried out to probe the possible involvement of selection in replication andrepair mechanisms[27]. determining bacterial nucleotide content. In order for an increase in BGC to explain GC richness, Concluding remarks recombinationshouldbepervasiveenoughinGCrichbacteriato Inthisstudyweuseddatafromfivestrictlyclonalpathogensto drive GC contents that are elevated substantially above those analyze the variation in point mutation biases in bacteria. These observedeveninsexuallyreproducingeukaryotes.Itisverylikely pathogens are uniquely suitable for such analyses as they can be that per generation advantage given to GC nucleotides through shown to be evolving under selection that is severely inefficient geneconversion(whichisdeterminedbyrecombinationratesand relative to stochastic processes. Unlike obligate intracellular by the conversion bias [1,10]) is significantly higher in sexually bacteria that have been evolving under inefficient selection for reproducing eukaryotes than in prokaryotes for which recombi- long evolutionary times and have lost much of their repair nation is assumed to be less frequent [1]. However, it is possible pathways these clonal pathogens have experienced only a short- that BGC may still affect some bacteria more strongly than termrelaxationinselectionefficiencyandarelikelytohaveintact eukaryotes, if N is increased in these bacteria by a higher factor repair mechanisms. Their mutational biases should therefore e than the factor by which the advantage given to GC nucleotides reflect those of their other clade members that are not subject to through gene conversion is decreased. This is an intriguing inefficientselection.Wedemonstratedthateventhoughthesefive possibility, opening up new avenues of research into recombina- pathogens belong to four very diverse clades with very different tion ratesandvariability inN among bacteria. nucleotidecontentsmutationinallofthemisbiasedtowardsAT, e PLoSGenetics | www.plosgenetics.org 10 September2010 | Volume 6 | Issue 9 | e1001115
Description: