Table Of ContentJournalofMedicalMicrobiology(2014),63,293–308 DOI10.1099/jmm.0.064220-0
Immunostimulatory CpG motifs in the genomes of
gut bacteria and their role in human health and
disease
Ravi Kant,1 Willem M. de Vos,1,2,3 Airi Palva1 and Reetta Satokari1
Correspondence 1Department ofVeterinary Biosciences, University of Helsinki,PO Box 66,FI-00014, Helsinki,
ReettaSatokari Finland
reetta.satokari@helsinki.fi 2Haartman Institute, University ofHelsinki,PO Box 21, FI-00014,Helsinki, Finland
3Laboratory of Microbiology, Wageningen University, Dreijenplein 10,6703 HBWageningen,
The Netherlands
Toll-like receptor (TLR) signalling playsan important role inepithelialand immunecells ofthe
intestine. TLR9 recognizesunmethylated CpGmotifs inbacterial DNA,andTLR9 signalling
maintains the gutepithelial homeostasis. Here,we carried out abioinformatic analysis ofthe
frequencyofCpGmotifsinthegenomesofgutcommensalbacteriaacrossmajorbacterialphyla.
ThefrequencyofpotentiallyimmunostimulatoryCpGmotifs(allCpGhexamers)orpurine-purine-
CG-pyrimidine-pyrimidine hexamerswaslinearly dependent onthe genomic G+Ccontent. We
found thatspecies belonging toProteobacteria,Bacteroidetes andActinobacteria (including
bifidobacteria) carried high counts ofGTCGTT, the optimalmotif stimulating humanTLR9. We
alsofound thatEnterococcus faecalis,Lactobacillus casei,Lactobacillusplantarum and
Lactobacillusrhamnosus,whosestrainshave been marketed as probiotics,had highcounts of
GTCGTT motifs. As gutbacterial species differ significantly intheir genomic content of CpG
motifs, the overallloadofCpG motifs intheintestine depends onthe species assembly of
microbiotaandtheircellnumbers.TheoptimalCpGmotifcontentofmicrobiotamaydependon
the host’sphysiological statusand, consequently, on anadequate level ofTLR9signalling. We
speculate thatmicrobiota withincreased numbers ofmicrobes withCpG motif-rich DNA could
better support mucosal functions inhealthy individuals andimprove the T-helper 1(Th1)/Th2
imbalance inallergicdiseases. In autoimmunedisorders, CpG motif-rich DNA could, however,
further increase theTh1-type immune responsiveness. Estimation ofthe loadofmicrobe-
Received 12June2013 associated molecularpatterns, including CpGmotifs, ingutmicrobiota could shednewlight on
Accepted 18November2013 host–microbe interactions across arange ofdiseases.
INTRODUCTION during intestinal ulceration (Abreu, 2010; Maynard et al.,
2012; Wells et al., 2011).
Inthemammalianintestine,asinglelayerofepithelialcells
separatesthegutlumenanditsdensemicrobialpopulation The immediate recognition of microbes by the innate
(microbiota) from the rest of the body. The monolayer of immune system plays an important role in immune
intestinal epithelial cells (IECs) acts both as a physical responsiveness and homeostasis in the intestine (Abreu,
barrier and as a regulator of the immune responses by 2010; Maynard et al., 2012; Wells et al., 2011). As part of
sensing contents in the gut lumen and transferring signals the innate immune system, Toll-like receptors (TLRs)
totheimmunecellsresidinginthelaminapropria(Abreu, recognize conserved microbial structures, so-called micro-
2010; Maynard et al., 2012; Wells et al., 2011). Immune bial-associated molecular patterns (MAMPs) such as
cells can also get in contact with intestinal microbiota by lipoteichoid acids (TLR2), LPS (TLR4), flagellin (TLR5)
directly sampling the gut lumen (dendritic cells) via the and nucleic acids (TLR3, -7, -8 and -9) (Abreu, 2010;
transepithelial transport of antigens by microfold cells or Maynard et al., 2012; Wells et al., 2011). TLRs were first
found in immune cells but later were also discovered in
epithelialcellsincludingIECs.UponTLRstimulation,IECs
respond by transcriptional changes and can also transfer
Abbreviations:IEC,intestinalepithelialcell;MAMP,microbe-associated
signals to the lamina propria compartment (Abreu, 2010;
molecular pattern; NEC, necrotizing enterocolitis; Pu, purine; Py,
pyrimidine;Th,T-helper;TLR,Toll-likereceptor. Maynard etal.,2012;Wellsetal.,2011).Theexpressionof
064220G2014SGM PrintedinGreatBritain 293
R.Kantandothers
TLRs in IECs is polarized in order to provide adequate WhilsttheroleofTLR9inmaintaining homeostasisinthe
responses depending on whether the signals come from gut epithelium is well recognized, the CpG motif content
luminal or invading microbes (i.e. pathogens) (Abreu, of microbiota has received little attention. However, the
2010;deKivitetal.,2011;LeeJ.etal.,2006;Maynardetal., composition of gut microbiota could have a significant
2012; Wells et al., 2011). It has been demonstrated that effect on the load of TLR9 ligands in the intestine. In this
TLR stimulation affects enterocyte proliferation (Gribar study, we explored the immunostimulatory CpG motifs in
et al., 2008), enhances intestinal epithelial barrier function thegenomesofcommensalintestinalbacteriaacrossmajor
(Cario et al., 2004; O’Hara et al., 2012) and affects the bacterial phyla inhabiting the intestine. We studied the
productionofantimicrobialpeptides(Foureauetal.,2010, relationshipbetweenCpGmotifcontentandG+Ccontent
LeeJ.etal.,2006).Furthermore,TLRstimulationproduces (GC%) or the size of bacterial genomes. Furthermore, we
tolerance in IECs towards subsequent challenges with the discuss the CpG motif content of various bacteria and
same or other TLR ligands, i.e. induces cross-tolerance microbiota in the context of human health and disease.
andreducesintestinalinflammation (Ghadimi etal.,2010;
Lee J. et al., 2006). Furthermore, TLR signalling has an
important role in instruction of the adaptive immune METHODS
system and maintaining the gut epithelial homeostasis
Bacterial genomes. Complete or almost complete bacterial and
(Maynard et al., 2012). Importantly, dysregulation of TLR archaeal genome sequences, totalling 67 genomes from 59 species,
signalling can lead to an inappropriate reaction towards weredownloadedinFASTAformatfromGenBank.Theselectedspecies
gutcommensals,aswellasinflammationoftheepithelium comprised 43 bacterial and three Archaea species found in the
(Abreu, 2010; Maynard et al., 2012; Wells et al., 2011). intestine and representing different phyla and genomic G+
C content, two dairy starter species, six intestinal pathogens, four
ThecanonicalligandsforTLR9includeunmethylatedCpG Corynebacterium sp. that are not typical gut inhabitants including
motifsprevalentinbacterialbutnotinvertebrate genomic the pathogen Corynebacterium diphtheriae, and the respiratory
DNAs (Hemmi et al., 2000; Krieg, 2002; Yu et al., 2007). pathogen Mycobacterium tuberculosis. The bacterial strains and
genomesequencesarecompiledinTables1and2.
The optimal motif for activating human cells has been
foundtohavethesequenceGTCGTT(Hartmann&Krieg,
CpG motifs and motif search. The following CpG motifs were
2000), whereas the general motif formula for activating searched for in the bacterial genome sequences: all possible CpG
mouseandrabbitcellsisPuPuCGPyPy(wherePuispurine motifs,i.e.allCG-containinghexamers(inpractice,asearchforCG)
and Py is pyrimidine) (Krieg et al., 1995; Rankin et al., (Lee K.W. et al., 2006), the general motif formula for activating
mouseandrabbitcells(PuPuCGPyPy)(Kriegetal.,1995;Rankinet
2001; Yi et al., 1998). Like most TLRs, TLR9 is expressed
al.,2001;Yietal.,1998)andtheoptimalmotifforactivatinghuman
both by immune cells including dendritic cells, macro-
cells(GTCGTT))(Hartmann&Krieg,2000).Toidentifythedifferent
phagesandB-cells,andbyIECs(Abreu,2010;Krieg,2002).
motifsandpatternsinthebacterialgenomesequences,FUZZNUCwas
In immune cells, TLR9 is expressed intracellularly in the used from the EMBOSS (European Molecular Biology Open Software
endosomes, whereas IECs display TLR9 on the cell surface Suite) package (Rice et al., 2000). Additionally, custom-made in-
(Barton et al., 2006; Lee J. et al., 2006). CpG-induced housescriptswereusedintheanalysis.
activationofimmunecellstriggersaT-helper1(Th1)-type
immune response (Hemmi et al., 2000; Krieg, 2002; Yu
RESULTS
etal.,2007),whichhasbeenutilizedinthedevelopmentof
vaccineadjuvantsandimmunotherapiesforallergy,cancer In this study, we bioinformatically analysed the frequency
and infectious diseases (Krieg, 2012). IECs express TLR9 of potent immunostimulatory CpG motifs (Hartmann &
on both the apical and basolateral sides, and distinct Krieg,2000;Kriegetal.,1995;LeeK.W.etal.,2006;Rankin
responses occur depending on which side the stimulus et al., 2001; Yi et al., 1998) in bacterial genomes. We
comes from (Lee J. et al., 2006). Stimulation of TLR9 on included in the analysis: (i) CG dinucleotides (i.e. all
the apical side inhibits the inflammatory response of IECs possibleCpGhexamers);(ii)PuPuCGPyPyhexamers,which
and induces tolerance towards other MAMPs. Basolateral representageneralformulaforCpGmotifsactivatingmouse
stimulation, on the other hand, leads to the NF-kB or rabbit immune cells; and (iii) the GTCGTT hexamer,
pathway activation and the release of pro-inflammatory whichisanoptimalmotifforactivatinghumancells(Tables
IL-8 (Lee J. et al., 2006). A recent in vitro study 1 and 2). First, we analysed the genomes of several strains
demonstrated that apical stimulation of TLR9 of IECs of three species, Clostridium perfringens (three strains),
triggered a Th1-type immune response and concurrent Escherichia coli (five strains) and Bifidobacterium bifidum
regulatory IL-10 secretion of peripheral blood mono- (threestrains),inordertoassessthepossiblestrainvariance
nuclear cells on the basolateral side (de Kivit et al., 2011). ingenomicCpGcontentwithinthesamespecies(Table1).
In general, apical stimulation of TLR9 is considered to We found that different strains of the same species had
improve the barrier functions of mucosa (O’Hara et al., highly similar genomic CpG content with respect to all
2012). Furthermore, TLR9-deficient mice are more sus- studied motifs (Table 1). Furthermore, we studied the
ceptibletodextransodiumsulfate-inducedcolitis(LeeJ.et genomesof43bacterialandthreeArchaeaspeciesfoundin
al., 2006), which also underlines the importance of TLR9 the intestine and representing different phyla and genomic
signallinginpromotinghomeostasisinthegutepithelium. G+C content (Table 2). In addition to the intestinal
294 JournalofMedicalMicrobiology63
CpGmotifsingutmicrobes
Table1.Comparison ofCpG motif content amongstrains belonging tothe same species
Species Strain GC% Size(Mb) MotiffrequencyperMb Motifcountpergenome Referencefor
thegenome
CpG PuPuCGPyPy GTCGTT CpG PuPuCGPyPyGTCGTT
Clostridium SM101 28 2.9 11218 928 23 32532 2690 68 Myersetal.(2006)
perfringens
ATCC13124 28 3.3 9511 823 23 31386 2716 76 Myersetal.(2006)
str.13 28 3.0 10467 884 22 31402 2652 67 Shimizuetal.(2002)
Escherichiacoli UMN026 50 5.2 147886 11864 494 769008 61692 2569 Touchonetal.(2009)
IAI39 50 5.1 147371 11735 512 751592 59846 2610 Touchonetal.(2009)
55989 50 5.2 145190 11530 491 754990 59956 2552 Touchonetal.(2009)
ED1a 50 5.2 146073 11658 498 759582 60622 2587 Touchonetal.(2009)
S88 50 5.0 147853 11866 505 739264 59330 2525 Touchonetal.(2009)
Bifidobacterium S17 63 2.2 248567 15006 671 549334 33164 1483 Zhurinaetal.(2011)
bifidum
PRL2010 63 2.2 248562 14992 671 544350 32832 1469 Turronietal.(2010)
BGN4 63 2.2 248611 14888 681 551916 33052 1512 Yuetal.(2012)
commensal species, two dairy starter species and six Roseburia hominis and Lactobacillus delbrueckii had a
intestinal pathogens were included in the analysis. We also frequency above the median value of PuPuCGPyPy motifs.
included four Corynebacterium sp. that are not typical gut The highest PuPuCGPyPy motif frequencies were found in
inhabitantsandMycobacteriumtuberculosis,whichbelongto Escherichia coli, Shigella dysenteriae, Salmonella enterica,
the Actinobacteria and are high G+C content bacteria, for Enterobacter aerogenes, K. pneumoniae, Pseudomonas aerugi-
comparison(Table2). nosaandsomeActinobacteria,i.e.Bifidobacteriumadolescentis,
Bifidobacterium longum subsp. infantis, Bifidobacterium
The CpG hexamer frequency (CpG motifs per Mb of bifidum, Corynebacterium jeikeium, Corynebacterium urealy-
genomic DNA) increased linearly with the genomic G+C ticumandMycobacteriumtuberculosis.
content as expected (R250.94, Fig. 1a). There was no
linear dependency (R2,0.80) between the total number of The total amount or frequency of GTCGTT (the optimal
motif stimulating human TLR9) showed an increase
CpG motifs and the genome size (Fig. 1b), but large
(although not linear) with the increase in genomic G+C
genomes (over ~4.8 Mb) contained higher numbers of
content (Fig. 1e). Several species showed an exceptionally
CpG motifs.
highfrequencyorcountsofthesemotifsasexplainedinmore
The PuPuCGPyPy motifs (the general motif formula for detailbelow.SimilarlytoallCpGhexamersandPuPuCGPyPy
activatingmouseandrabbitcells)showedalinearincreasein motifs,therewasnolineardependencybetweenthenumber
their relative frequency (per Mb) with increasing genomic ofGTCGTTmotifsandgenomesize(Fig.1f).However,large
G+C content (R250.80, Fig. 1c). There was no linear genomesofover~4.8 Mbhadhighercountsofthe‘human-
dependencybetweenthenumberofPuPuCGPyPymotifsand specific’ motif compared with the smaller genomes (Fig. 1f,
the genome size, but large genomes of over ~4.8 Mb were Table 2). Bacteria with more than the median value of
notedtohavehighercountsofthesemotifs(Fig.1d,Table2). GTCGTT motifs per genome included species belonging to
Bacteria with more than the median value of PuPuCGPyPy the phylum Bacteroidetes (mostly having large genomes),
motifspergenomeincludedallActinobacteria,allProteobac- ProteobacteriawiththeexceptionofCampylobacterjejuniand
teriaexceptforAcinetobactercalcoaceticusandCampylobacter Enterococcus faecalis, Lactobacillus plantarum, Lactobacillus
jejuni, Bacteroides thetaiotaomicron and Parabacteroides dis- rhamnosus, Lactobacillus casei and Bacillus cereus within
tasonis within the Bacteroidetes, Lactobacillus casei, Lacto- the Bacillus subgroup of Firmicutes (Table 2). Among
bacillus rhamnosus and Lactobacillus plantarum within the Actinobacteria (high G+C content bacteria), all studied
Bacillus subgroup of Firmicutes, Eubacterium limosum and Bifidobacterium spp. were found to have GTCGTT motif
FaecalibacteriumprausnitziiwithintheFirmicutesClostridium countsabovethemedianvalue.Incontrast,withinthegenus
clusters XV and IV, respectively, and Akkermansia mucini- Corynebacterium, only two species (Corynebacterium gluta-
phila (Verrucomicrobia). Six species belonging to Proteo- micumandCorynebacteriumdiphtheriae)hadGTCGTTmotif
bacteria, Escherichia coli, Shigella dysenteriae, Salmonella countsabovethemedianvalue.Threeotherspecieswithinthe
enterica, Enterobacter aerogenes, Klebsiella pneumoniae and same genus, Corynebacterium jeikeium, Corynebacterium
Pseudomonas aeruginosa, stood out for having exceptionally urealyticum and Corynebacterium efficiens, had much lower
high counts of PuPuCGPyPy motifs (Table 2). The above- countsofthismotif,despitehavingcomparableG+Ccontent
mentioned bacteria with the exception of Bacteroides and size to Corynebacterium glutamicum, Corynebacterium
thetaiotaomicron also hada high frequency of PuPuCGPyPy diphtheriae and bifidobacteria (Table 2). K. pneumoniae,
motifs (motifs per Mb of genomic DNA). In addition, Enterobacter aerogenes, Escherichia coli, Shigella dysenteriae,
http://jmm.sgmjournals.org 295
2 R
9 .
6 K
a
Table2.CpGmotif content inprokaryotic genomes n
t
a
n
d
Bacteria Speciesand GC% Size Motifcount Motiffrequency GTCGTT Referencefor
o
strain (Mb) pergenome perMb frequency thegenome th
e
per rs
Phylum/subgroup CpG PuPuCGPyPy GTCGTT CpG PuPuCGPyPyGTCGTT
Mb(cid:2)GC%
Firmicutes/Bacilli Staphylococcus 32 2.9 146732 12876 1192 50597 4440 411 13 Neohetal.(2008)
aureusMu3
Firmicutes/Bacilli Staphylococcus 32 2.7 131822 11900 1058 48823 4407 392 12 Takeuchietal.(2005)
haemolyticus
JCSC1435
Firmicutes/Bacilli Staphylococcus 32 2.5 109898 9512 897 43959 3805 359 11 Zhangetal.(2003)
epidermidis
ATCC12228
Firmicutes/Bacilli Lactobacillus 34 2.0 93054 6714 544 46527 3357 272 8 Altermannetal.(2005)
acidophilus
NCFM
Firmicutes/Bacilli Lactococcus 35 2.5 128128 9678 947 51251 3871 379 11 Wegmannetal.(2007)
lactisMG1363*
Firmicutes/Bacilli Bacilluscereus 35 5.4 338112 22926 2097 62267 4222 386 11 Ivanovaetal.(2003)
ATCC14579
Firmicutes/Bacilli Lactobacillus 37 2.1 117914 9446 621 56150 4498 296 8 Callananetal.(2008)
helveticus
DPC4571
Firmicutes/Bacilli Enterococcus 37 3.2 225216 19166 1675 70380 5989 523 14 Paulsenetal.(2003)
faecalisV583
Firmicutes/Bacilli Listeria 38 3.0 239154 15662 1302 79718 5221 434 11 Gilmouretal.(2010)
monocytogenes
08-5923D
Firmicutes/Bacilli Streptococcus 39 1.9 102542 8232 789 53969 4333 415 11 Makarovaetal.(2006)
thermophilus
LMD-9*
J Firmicutes/Bacilli Streptococcus 40 2.2 119418 8366 789 55543 3891 367 9 Denapaiteetal.(2010)
o
urn mitisB6
al Firmicutes/Bacilli Streptococcus 43 2.4 157740 10336 801 65725 4307 334 8 XuP.etal.(2007)
o
f sanguinisSK36
M
e Firmicutes/Bacilli Lactobacillus 44 3.4 366552 28602 2834 108127 8437 836 19 Kleerebezemetal.(2003)
d
ic plantarum
a
l
M WCFS1
icro Firmicutes/Bacilli Lactobacillus 46 3.1 346378 25098 1678 111735 8096 541 12 Maze´ etal.(2010)
b caseiBL23
io
lo
g
y
6
3
h
ttp
:/
/
jm
m
.s Table2.cont.
g
m
jo
u Bacteria Speciesand GC% Size Motifcount Motiffrequency GTCGTT Referencefor
rn
a strain (Mb) pergenome perMb frequency thegenome
ls
.o per
rg Phylum/subgroup CpG PuPuCGPyPy GTCGTT CpG PuPuCGPyPyGTCGTT
Mb(cid:2)GC%
Firmicutes/Bacilli Lactobacillus 47 3.0 360288 26338 2278 119697 8750 757 16 Kankainenetal.(2009)
rhamnosusGG
Firmicutes/Bacilli Lactobacillus 49 1.8 189146 12630 999 105081 7017 555 11 vandeGuchteetal.
delbrueckii (2006)
ATCC11842
Firmicutes/Clostridium Clostridium 28 3.3 31386 2716 76 9511 823 23 1 Myersetal.(2006)
clusterI perfringens
ATCC13124
Firmicutes/Clostridium Clostridium 42 3.0 274600 14982 811 92458 5044 273 6 Poehleinetal.(2013)
clusterIII stercorariumDSM
8532
Firmicutes/Clostridium Faecalibacterium 56 3.2 449598 28532 1112 140062 8888 346 6 A.Pajon,K.Turner,J.
clusterIV prausnitziiSL3/3 Parkhill,S.Duncan&H.
Flint,unpublished
Firmicutes/Clostridium Ruminococcus 41 2.3 207116 8946 502 92052 3976 223 5 A.Pajon,K.Turner,J.
clusterIV bromiiL2-63 Parkhill,S.Duncan&H.
Flint,unpublished
Firmicutes/Clostridium Clostridium 29 4.3 61950 4452 342 14407 1035 80 3 Sebaihiaetal.(2006)
clusterXI difficile630D
Firmicutes/Clostridium Finegoldiamagna 32 2.0 76826 3850 451 38606 1935 227 7 Gotoetal.(2008)
clusterXIII ATCC29328
Firmicutes/Clostridium Butyrivibrio 39 3.2 133190 8734 502 42149 2764 159 4 A.Pajon,K.Turner,J.
clusterXIVa fibrisolvens16/4 Parkhill,S.Duncan&H.
Flint,unpublished
Firmicutes/Clostridium Ruminococcus 43 3.8 298074 12036 952 79275 3201 253 6 A.Pajon,K.Turner,J.
clusterXIVa obeumA2-162 Parkhill,S.Duncan&H.
Flint,unpublished
C
Firmicutes/Clostridium Eubacterium 42 3.5 223334 9772 598 64734 2832 173 4 Mahowaldetal.(2009) p
G
clusterXIVa rectaleATCC m
33656 otifs
in
g
u
t
m
ic
29 robe
7 s
2 R
9 .
8 Table2.cont. K
a
n
t
a
Bacteria Speciesand GC% Size Motifcount Motiffrequency GTCGTT Referencefor n
d
strain (Mb) pergenome perMb frequency thegenome o
th
per e
Phylum/subgroup CpG PuPuCGPyPy GTCGTT CpG PuPuCGPyPyGTCGTT rs
Mb(cid:2)GC%
Firmicutes/Clostridium Clostridium 45 4.7 353358 17988 633 75828 3860 136 3 S.Lucas,A.Copeland,A.
clusterXIVa saccharolyticum Lapidus,J.F.Cheng,D.
WM1 Bruce,L.Goodwin,S.
Pitluck,O.Chertkov,
J.C.Detter,C.Han,R.
Tapia,M.Land,L.
Hauser,Y.J.Chang,C.
Jeffries,N.Kyrpides,N.
Ivanova,N.Mikhailova,
H.Mouttaki,L.Lin,J.
Zhou,C.L.Hemme&
T.Woyke,unpublished
Firmicutes/Clostridium Roseburia 49 3.6 447556 24014 940 124667 6689 262 5 I.E.Mulder,R.I.
clusterXIVa hominisA2183 Aminov,A.J.Travis,
A.Lan,V.Gaboriau-
Routhiau,K.Garden,
E.Logan,M.Delday,
A.G.P.Coutts,G.
Grant,A.M.Patterson,
N.Cerf-Bensussan&D.
Kelly,unpublished
Firmicutes/Clostridium Eubacterium 48 4.3 455480 37892 1167 106421 8853 273 6 Rohetal.(2011)
clusterXV limosumKIST612
Proteobacteria Campylobacter 31 1.6 47246 3524 180 28809 2149 110 4 Parkhilletal.(2000)
jejuniNCTC
11168D
Proteobacteria Acinetobacter 39 3.9 258684 18792 1453 67017 4868 376 10 Zhanetal.(2011)
Jo calcoaceticus
u
rn PHEA-2
a
l Proteobacteria Proteus 39 4.1 295416 24422 1645 72053 5957 401 10 Pearsonetal.(2008)
o
f
M mirabilis
ed HI4320
ic
a Proteobacteria Yersinia 47 4.7 504190 34436 1393 107733 7358 298 6 Thomsonetal.(2006)
l
M enterocolitica
icro 8081D
b
io
lo
g
y
6
3
h
ttp Table2.cont.
:/
/
jm Bacteria Speciesand GC% Size Motifcount Motiffrequency GTCGTT Referencefor
m
.s strain (Mb) pergenome perMb frequency thegenome
g
m per
jou Phylum/subgroup CpG PuPuCGPyPy GTCGTT CpG PuPuCGPyPyGTCGTT Mb(cid:2)GC%
rn
a
ls Proteobacteria Escherichia 50 5.2 769008 61692 2569 147886 11864 494 10 Touchonetal.(2009)
.o
rg coli/UMN026
Proteobacteria Shigella 51 4.6 653436 54768 2155 143297 12011 473 9 Yangetal.(2005)
dysenteriae
Sd197D
Proteobacteria Salmonella 52 4.6 777938 70348 2229 169485 15326 486 9 McClellandetal.(2004)
entericaATCC
9150D
Proteobacteria Enterobacter 55 5.3 972776 85598 2468 184238 16212 467 12 Shinetal.(2012)
aerogenes
KCTC2190
Proteobacteria Klebsiella 57 5.4 1016530 91182 2337 188596 16917 434 8 Linetal.(2012)
pneumoniae
1084
Proteobacteria Desulfovibrio 58 3.9 820924 38116 1879 212675 9875 487 8 Brownetal.(2011)
desulfuricans
ND132
Proteobacteria Pseudomonas 67 6.3 1526246 121866 3106 243809 19467 496 7 Stoveretal.(2000)
aeruginosaPAO1
Bacteroidetes Prevotella 41 3.2 209866 13482 1595 66204 4253 503 12 D.M.Harkins,
melaninogenica R.Madupu,
ATCC25845 A.S.Durkin,M.
Torralba,B.Methe,G.
G.Sutton&K.E.
Nelson,unpublished
Bacteroidetes Bacteroides 42 6.2 579932 26668 2479 93537 4301 400 10 Xuetal.(2003)
thetaiotaomicron
VPI-5482
Bacteroidetes Bacteroides 42 5.2 418768 18816 1452 80532 3618 279 7 XuJ.etal.(2007)
vulgatus
C
ATCC8482 p
G
Bacteroidetes Bacteroides 43 5.2 489410 19700 1900 94117 3788 365 8 Cerden˜o-Ta´rragaetal.
m
fragilisNCTC (2005) o
9343 tifs
in
Bacteroidetes Parabacteroides 45 4.8 581492 29726 2189 120892 6180 455 10 XuJ.etal.(2007)
g
distasonis ut
m
ATCC8503 ic
29 robe
9 s
3 R
0 Table 2.cont. .
0 K
a
n
t
Bacteria Speciesand GC% Size Motifcount Motiffrequency GTCGTT Referencefor a
n
strain (Mb) pergenome perMb frequency thegenome d
o
per th
Phylum/subgroup CpG PuPuCGPyPy GTCGTT CpG PuPuCGPyPyGTCGTT Mb(cid:2)GC% ers
Actinobacteria Corynebacterium 53 3.3 467022 33978 1503 141522 10296 455 9 Ikeda&Nakagawa(2003)
glutamicum
ATCC13032d
Actinobacteria Corynebacterium 53 2.4 375648 26286 1546 156520 10953 644 12 Cerden˜o-Ta´rragaetal.
diphtheriaeNCTC (2003)
13129d§
Actinobacteria Bifidobacterium 59 2.3 458450 25740 1381 196760 11047 593 10 A.S.Durkin,M.Kim,D.
breveACS-071 Radune,J.Hostetler,M.
V-Sch8b Torralba,M.Gillis,B.
Methe,G.Sutto&K.E.
Nelson,unpublished
Actinobacteria Bifidobacterium 59 2.8 604886 33922 1732 216031 12115 619 10 Selaetal.(2008)
longumsubsp.
infantisATCC
15697
Actinobacteria Bifidobacterium 59 2.1 467150 27266 1342 223517 13046 642 11 T.Suzuki,Y.Tsuda,N.
adolescentis Kanou,T.Inoue,K.
ATCC15703 Kumazaki,S.Nagano,S.
Hirai,K.Tanaka&K.
Watanabe,unpublished
Actinobacteria Bifidobacterium 60 2.4 506276 27898 1385 210948 11624 577 10 Leeetal.(2008)
longumDJO10A
Actinobacteria Corynebacterium 61 2.4 492366 30430 1274 205153 12679 531 9 Tauchetal.(2005)
jeikeiumK411d
Actinobacteria Bifidobacterium 63 2.2 551916 33052 1512 248611 14888 681 11 Yuetal.(2012)
bifidumBGN4
Actinobacteria Corynebacterium 63 3.1 596944 29358 1047 192563 9470 338 5 Nishioetal.(2003)
efficiensYS-314d
J
o
u Actinobacteria Corynebacterium 64 2.4 518856 32276 1019 216190 13448 425 7 Tauchetal.(2008)
rn
a urealyticumDSM
l
of 7109
M
Actinobacteria Mycobacterium 66 4.4 1123610 73812 2794 254787 16737 634 10 Coleetal.(1998)
e
d
ic tuberculosis
al H37Rv§
M
ic Verrucomicrobia Akkermansia 56 2.7 388570 27780 678 146079 10444 255 5 vanPasseletal.(2011)
rob muciniphilaATCC
iolo BAA-835
g
y Medianvalue 363420 24218 1322 99599 6085 401 9
6
3
h
ttp
:/
/
jm
m
.s
g
m
jo
u
rn
als Table2. cont.
.o
rg
Bacteria Speciesand GC% Size Motifcount Motiffrequency GTCGTT Referencefor
strain (Mb) pergenome perMb frequency thegenome
per
Phylum/subgroup CpG PuPuCGPyPy GTCGTT CpG PuPuCGPyPyGTCGTT
Mb(cid:2)GC%
Archaea
Euryarcheota/ Methanobrevibacter 31 1.9 40592 1450 109 21942 784 59 Samueletal.(2007)
Methanobacteria smithiiATCC
35061
Euryarcheota/ Methanobacterium 36 2.6 80308 3998 397 31127 1550 154 S.Lucas,A.Copeland,A.
Methanobacteria sp.AL-21 Lapidus,J.F.Cheng,L.
Goodwin,S.Pitluck,O.
Chertkov,J.C.Detter,
C.Han,R.Tapia,M.
Land,L.Hauser,N.
Kyrpides,N.Ivanova,N
Mikhailova,I.Pagani,
H.Cadillo-Quiroz,H.
Imachi,S.Zinder,W.
Liu&T.Woyke,
unpublished
Euryarcheota/ Methanoculleus 61 2.8 548746 34030 1293 196683 12197 463 Mausetal.(2012)
Methanomicrobia bourgensisMS2
*Dairystarter,notconsideredasanindigenousintestinalspecies.
DSpeciesprimarilyconsideredasapathogeninthehumanintestine.
dNotmajorintestinalCorynebacteriumsp.butincludedintheanalysisasrepresentativesofthephylum.
§Pathogen/otherthanintestinal.
C
p
G
m
o
tifs
in
g
u
t
m
ic
30 robe
1 s
R.Kantandothers
Frequency Per genome
(a) 300000 (b) 1800000
R2=0.94 1600000
250000
1400000
200000 1200000
b
M G 1000000
G/ 150000 P
p C 800000
C
100000 600000
400000
50000
200000
0 0
25 30 35 40 45 50 55 60 65 70 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5
(c) 25000 (d) 140000
R2=0.80
120000
20000
PyPy/Mb 15000 GPyPy 10800000000
G C
uPuC 10000 PuPu 6400000000
P
5000
20000
0 0
25 30 35 40 45 50 55 60 65 70 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5
(e) 900 (f)
3500
800
3000
700
b 600 2500
M T
TCGTT/ 540000 GTCGT 21050000
G 300
1000
200
100 500
0 0
25 30 35 40 45 50 55 60 65 70 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5
GC% Genome size (Mb)
Actinobacteria Firmicutes/Bacilli
Actinobacteria** Firmicutes/Bacilli
Firmicutes/Clostridium cluster XIVa Firmicutes/Other Clostridium clusters
Bacteroidetes Verrucomicrobia
Proteobacteria
Pathogens
Fig.1.FrequencyandcountsofCpGmotifsinbacterialgenomes.(a,c,e)ThedependenceofCpGmotiffrequency(i.e.the
CpGmotifcountsperMb)ontheG+Ccontent(GC%)ofgenomicDNA.(b,d,f)ThedependenceofCpGmotifcounton
genomesize(Mb).CpGstandsforallpossibleCpGmotifs(i.e.allCG-containinghexamers),PuPuCGPyPyisthegeneralmotif
formulaforactivatingmouseandrabbitcells,andGTCGTTistheoptimalmotifforactivatinghumancells.R2valuesforlinear
correlation(R2¢0.80)areshown.Squaresrepresentnon-pathogenicbacteriaregularlyfoundinthehealthyhumanintestinal
tract.DairystarterspecieswithinBacilli(*)andnotmajorintestinalspecieswithinActinobacteria(**)areshadedalightercolour.
Trianglesrepresentpathogenicspecies.ThebacterialspeciesandtheirCpGmotifcountsarecompiledinTable2.
Salmonella enterica, P. aeruginosa, Bacillus cereus, Lactobacillus value. The highest frequency of GTCGTT motifs was found
plantarum, Lactobacillus rhamnosus, Parabacteroides distasonis, in Lactobacillus plantarum, Lactobacillus rhamnosus, Bifido-
Bacteroides thetaiotaomicron and Mycobacterium tuberculosis bacterium sp., Corynebacterium diphtheriae and Mycobac-
were found to carry the highest GTCGTT motifs count per terium tuberculosis. In general, most species belonging to
genome. Bacteria with a high frequency of ‘human-specific’ Actinobacteria or Firmicutes/Bacillus, four out of 11 species
CpG motifs (motifs per Mb of genomic DNA) included the within Proteobacteria and three out of five species within
above-mentionedbacteriawithsomeexceptions:Bacilluscereus, Bacteroidetes showed a higher than median frequency of
YersiniaenterocoliticaandBacteroidesthetaiotaomicrondidnot GTCGTT motifs when the frequency was normalized against
show a high frequency of GTCGTT motifs, whereas an the genomic G+C content, i.e. frequency per (Mb6GC%)
additional four species, Staphylococcus aureus, Streptococcus (Table2,Fig.2).Thus,specificspecieshadahigherfrequencyof
thermophilus, Lactobacillus delbrueckii and Corynebacterium GTCGTT motifs than could be expected by their genomic
efficiens, showed a frequency that was above the median G+Ccontent.
302 JournalofMedicalMicrobiology63
Description:Lactobacillus rhamnosus, whose strains have been marketed as probiotics, had high . motifs per genome included all Actinobacteria, all Proteobac-.