TheISMEJournal(2011)5,1262–1278 &2011InternationalSocietyforMicrobialEcology Allrightsreserved1751-7362/11 www.nature.com/ismej ORIGINAL ARTICLE Community ecology of hot spring cyanobacterial mats: predominant populations and their functional potential Christian G Klatt1, Jason M Wood1, Douglas B Rusch2, Mary M Bateson1, Natsuko Hamamura1, John F Heidelberg3, Arthur R Grossman4, Devaki Bhaya4, Frederick M Cohan5, Michael Ku¨hl6,7, Donald A Bryant8,9 and David M Ward1 1DepartmentofLandResourcesandEnvironmentalSciences,MontanaStateUniversity,Bozeman,MT,USA; 2J Craig Venter Institute, Rockville, MD, USA; 3Department of Biological Sciences, Philip K Wrigley Marine ScienceCenter,UniversityofSouthern California, Avalon,CA,USA;4DepartmentofPlantBiology,Carnegie InstitutionforScience,StanfordUniversity,Stanford,CA,USA;5DepartmentofBiology,WesleyanUniversity, Middletown, CT, USA; 6Department of Biology, Marine Biological Section, University of Copenhagen, Helsingør, Denmark; 7Plant Functional Biology Climate Change Cluster, University of Technology, Sydney, New South Wales, Australia; 8Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, USA and 9Department of Chemistry and Biochemistry, Montana State University, Bozeman, MT, USA Phototrophic microbial mat communities from 601C and 651C regions in the effluent channels of Mushroom and Octopus Springs (Yellowstone National Park, WY, USA) were investigated by shotgun metagenomic sequencing. Analyses of assembled metagenomic sequences resolved six dominant chlorophototrophic populations and permitted the discovery and characterization of undescribed but predominant community members and their physiological potential. Linkage of phylogenetic marker genes and functional genes showed novel chlorophototrophic bacteria belonging to uncharacterized lineages within the order Chlorobiales and within the Kingdom Chloroflexi. The latter is the first chlorophototrophic member of Kingdom Chloroflexi that lies outside the monophyletic group of chlorophototrophs of the Order Chloroflexales. Direct comparison of unassembled metagenomic sequences to genomes of representative isolates showed extensive genetic diversity, genomic rearrangements and novel physiological potential in native populations as compared with genomic references. Synechococcus spp. metagenomic sequences showed a high degree of synteny with the reference genomes of Synechococcus spp. strains A and B0, but synteny declined with decreasing sequence relatedness to these references. There was evidence of horizontal gene transfer among native populations, but the frequency of these events wasinverselyproportionalto phylogenetic relatedness. The ISME Journal(2011)5, 1262–1278;doi:10.1038/ismej.2011.73;published online23June 2011 Subject Category: microbial population and community ecology Keywords: cyanobacteria; Chloroflexi; community structure and function; metagenomics Introduction cyanobacteria(Synechococcusspp.)formineffluent channels of these springs between temperatures of The cyanobacterial mats of alkaline siliceous hot 71 and 751C (the upper temperature limit of the springs in Yellowstone National Park (Supplemen- phototrophic mats), and B501C. tary Figures 1A and 1B) have been studied for Analysis of environmental 16S ribosomal RNA several decades as models for understanding the (rRNA) gene sequences showed a poor relationship composition, structure and function of microbial between initially cultivated isolates and predomi- communities (Brock, 1978; Ward et al., 1987, 1992, nant native populations. For instance, the predomi- 2002, 2011a). Simple and stable microbial commu- nantSynechococcusspp.ofthesemats(A/Blineage) nities containing dense populations of unicellular had p92% nucleotide identity at the 16S rRNA locus to the cultivated representatives available at Correspondence: CG Klatt, Department of Land Resources and that time (Ward et al., 1990). Similarly, based on Environmental Sciences, Montana State University, 334 Leon cultivation and pigment analyses (Bauld and Brock, JohnsonHall,POBox173120,Bozeman,MT59717-3120,USA. 1973; Pierson and Castenholz, 1974), it was once E-mail:[email protected] thoughtthatChloroflexusspp.,whichincultureuse Received15February2011;revised9May2011;accepted9May 2011;publishedonline23June2011 bacteriochlorophyll (BChl)-c and BChl-a to support Metagenomesofhotspringphototrophiccommunities CGKlattetal 1263 photo-heterotrophy (Pierson and Castenholz, 1974) frequency distributions and cluster analysis of or photo-autotrophy (Holo and Sireva˚g, 1986; scaffolds, to identify phylogenetically distinctive Strauss and Fuchs, 1993), were the dominant populations inhabiting the Octopus Spring and anoxygenic phototrophic bacteria in these mats. Mushroom Spring mats. Oligonucleotide frequency However, environmental 16S rRNA studies uncov- patterns contain phylogenetic information (Pride ered the importance of Roseiflexus spp. (Nu¨bel etal.,2003;Teelingetal.,2004)andhavebeenused et al., 2002), organisms that contain BChl-a but lack as a tool to determine phylogenetic signatures in BChl-c and grow axenically as photo-heterotrophs metagenomic data from microbial communities (Hanada et al., 2002), although they possess genes (Woyke et al., 2006; Wilmes et al., 2008; Dick encoding the enzymes of the 3-hydroxypropionate et al., 2009; Inskeep et al., 2010). Annotation of autotrophic pathway (Klatt et al., 2007). In these open reading frames was used to identify phylo- cases, the inference of chlorophototrophic physiol- genetically and functionally informative genes in ogies (that is, Chls are obligately required for the scaffolds. We used the sequenced genomes of phototrophy, in contrast to retinal-mediated proton selected organisms, many of which have been translocation) could be made because oxygenic cultivated from these or similar hot spring environ- chlorophototrophs and anoxygenic chlorophoto- ments, and some of which are close relatives of trophic Chloroflexales comprise monophyletic predominant native populations in these mats, to groups defined by 16S rRNA phylogeny. These ‘recruit’ metagenomic sequences (Supplementary predictions were confirmed with more recent culti- Table 1). This combined approach enabled us to (i) vationandgenomicanalysesofSynechococcusspp. discover new major populations of uncultivated and Roseiflexus spp. isolates closely related to community members; (ii) explore differences in the nativematpopulations(Allewaltetal.,2006;Bhaya functional potential of native populations as com- et al., 2007; van der Meer et al., 2010). pared with closely related isolates and (iii) observe The inference of functional potential from 16S differences in genomic content and synteny among rRNA phylogeny is more problematic when closely related populations. This study also created sequences do not belong to groups that are mono- a foundation for a companion study using meta- phyletic with respect to function. For instance, transcriptomics to describe in situ gene expression based on the observation that some 16S rRNA inthechlorophototrophictaxa(Liuetal.,2011),the sequences retrieved from the mats fell just outside results of which strongly support our functional the monophyletic clade of known Chlorobiales, inferencesandexpanduponinsitugeneexpression Ferris and Ward (1997) suggested the possible studies of these mats (Steunou et al., 2006, 2008; presence of bacteria closely related to green sulfur Jensen et al., 2011). bacteria. Targeted analyses of photosynthetic reac- tion center genes provided evidence in support of this hypothesized functional group (Bryant et al., Materials and methods 2007), but there was no way to associate the functional genes directly with the phylogenetic The experimental approaches have been presented marker gene. Despite the successful retrieval of in the following sections. Technical details of the Chlorobiales from other thermal environments methods used have been provided in Supplemen- (Wahlund et al., 1991; Madigan et al., 2005), this tary Information Section 3. organism has to date evaded cultivation. Interest- ingly, the search for photosynthetic reaction center genes in the mats led to the discovery of the first Collection, preliminary sequence analysis and known chlorophototrophic member of Kingdom metagenomic sequencing Acidobacteria, Candidatus Chloracidobacterium Microbial mats were collected from Mushroom thermophilum (Ca. C. thermophilum) (Bryant Spring (44.53861N, 110.79791W) on 2 October 2003 et al., 2007; usage of ‘kingdom’ for major Domain and from Octopus Spring (44.53401N, 110.79781W) sub-lineages sensu Ward et al., 2011b). The infer- on5November2004(Bhayaetal.,2007)atsiteswith ence of potential chlorophototrophy was based on average temperatures of B60 and B651C. Synecho- thediscoveryofametagenomicclonecontainingan coccus spp. genotypes B0 and A are the respective insert of mat DNA with both phylogenetic marker dominant cyanobacterial 16S rRNA sequences at and functional genes. Because cultivated acidobac- these temperatures. Samples were collected and teriawerenotpreviouslyknowntobephototrophic, sectioned vertically into approximately 1-mm-thick inferences based on 16S rRNA data concerning the layers, which were frozen and then stored at (cid:2)801C potentialforphototrophycouldnothavebeenmade until further analysis. After enzymatic lysis of cells before this discovery. Studies of an enrichment in the top green layer, DNA was extracted and culture of Ca. C. thermophilum (Bryant et al., sequences were characterized by PCR amplification 2007)anditsgenome(Bryantetal.,2011)confirmed of cyanobacterial 16S rRNA genes and subsequent the inferences made from genetic data. analysisbydenaturinggradient-gelelectrophoresisto In this study, we used assembly of metagenomic verify the presence of Synechococcus A- and B0-like sequences, combined with oligonucleotide genotypes (Supplementary Figure 3). The extracted TheISMEJournal Metagenomesofhotspringphototrophiccommunities CGKlattetal 1264 DNA was sheared into B1–3 and B10–12-kb genomes from organisms isolated from thermal fragments, which were used to prepare four meta- springs, known to be functionally and/or phylo- genomic libraries corresponding to the low- and genetically related to indigenous mat populations high-temperature samples from both Octopus and and/or processes, or representative of phylogenetic MushroomSprings.Endsequencesofclonedinserts groups not otherwise included (Supplementary were produced by Sanger sequencing at the J Craig Table 1). The percent nucleotide identity (% NT Venter Institute (JCVI, Rockville, MD, USA). ID) of metagenomic sequences relative to the reference genome that recruited them was used to identify those that could be confidently associated Metagenome assembly and annotation withthereferenceorganism,takingintoaccountthe The metagenomic sequences were assembled into % NT ID between the genomes of strains of named scaffolds using the Celera assembler (Miller et al., species and genera (approximately 470% NT ID 2008), with the ‘error’ (that is, mismatch) rate set to amongspeciesofnamed genera;seeSupplementary 8% for the purpose of assembling non-identical InformationSection3andSupplementaryFigure7). close relatives, and the utgGenomeSize set to The end sequences of a clone were considered 2000000. The phylogenetic and functional marker ‘jointlyrecruited’ifthesequenceswererecruitedby genes in assemblies were identified using the the same genome, or were considered ‘disjointly programs AMPHORA (Wu and Eisen, 2008), the recruited’ if their end sequences were recruited by JCVIannotationpipeline(Tanenbaumetal.,2010)or different reference genomes. The end sequences of BLAST (Altschul et al., 1990), using known refer- jointlyrecruitedcloneswereconsidered‘syntenous’ ence sequences as queries. All annotations are whenthesequenceshadthesameorientationasthe inferences based on multiple lines of evidence reference genome and were separated by a distance produced using the tools listed above, but their that was similar to the size of the DNA fragments functions are considered hypotheses for future used to construct the metagenomic library. Jointly biochemical characterization. recruited sequences that did not meet both of these criteria were considered ‘non-syntenous.’ The de- tails of this process are described in Supplementary Clustering and characterization of assemblies Information Section 3. Oligonucleotide patterns were determined to obtain phylogenetic signals (Teeling et al., 2004) by count- ing the frequencies of all possible tri-, tetra-, penta- Results and hexa-nucleotide combinations for each scaffold X20000bp. Frequency counts were normalized by Sanger sequencing of samples from all sites and thelengthoftherespectivescaffoldandsubjectedto temperatures yielded 167Mb of metagenomic k-means clustering (Kanungo et al., 2002) using the sequence data.Assembly resulted in5769scaffolds, a priori value of k equal to 8 (see Supplementary totaling 33Mb, which were produced from 67Mb Information Section 3 for rationale). Scaffolds that (40%)ofthetotalsequencedataset.Clusteranalysis clustered together with X90% of 100 Monte Carlo of oligonucleotide frequencies was used to charac- trialsweremappedusingCytoscape(Shannonetal., terize 394 scaffolds that were X20000bp in length, 2003).Manyscaffoldsformedassociationswithcore totaling 20.2Mb (Table 1). Prior to assembly, clusters at less stringent thresholds, but, except recruitment by reference genomes above the speci- where noted, these were not included in the cluster fied%NTIDcutoffsindicatedinTable2accounted analysis described here. for 102Mb of the total sequence data set (61%). Scaffold clusters accounted for an additional 13Mb (7.8%) of the total unassembled metagenomic BLASTNrecruitmentandsyntenywithreferencegenomes sequences that were not recruited to reference Metagenomic sequences were used as queries in a genomes above % NT ID cutoffs. Thus, we could custom BLASTN search to a selected database of 20 confidently assign 69% of the total metagenomic Table1 AssemblystatisticsofscaffoldclustersX20000bpinlength Cluster Phylogeneticaffiliation Numberof Mbsequence Meanscaffold Meandepthof scaffolds incluster length(kb) coverage(readdepth) 1 Synechococcusspp. 68 3.54 52.1 4.7 2 Roseiflexusspp. 78 4.26 54.7 3.7 3 Chloroflexusspp. 59 1.91 32.4 1.7 4 Ca.C.thermophilum-likeorganisms 10 3.20 319 3.7 5 Chlorobiales-likeorganisms 32 2.82 88.0 5.7 6 Anaerolineae-likeorganisms 46 2.31 50.3 3.2 7 UnknownCluster-1 27 1.42 52.4 2.2 8 UnknownCluster-2 39 1.62 41.6 4.4 TheISMEJournal Metagenomesofhotspringphototrophiccommunities CGKlattetal 1265 Table2 Comparisonofmetagenomicanalysesbasedongenomerecruitmentandassemblya Referencegenome Reference Mean±s.d.% %NTIDrange %ofindividual %oftotal genome NTIDof usedinanalysesb metagenomicsequences assembled size(Mb) recruited recruitedc sequences sequences MSlow MShigh OSlow OShigh Total Synechococcussp.strain-A 2.93 94.1±10.8 92–100 7.03 22.1 6.36 21.1 11.0 9.78 SynechococcusA0d — — 83–92 0.63 2.92 1.23 1.14 1.57 1.15 Synechococcussp.strain-B0 3.04 95.6±7.4 90–100 22.1 1.15 24.9 1.10 17.7 8.75 Roseiflexussp.strain-RS1 5.80 84.6±16.1 80–100 16.0 9.58 7.52 14.9 9.17 9.42 Chloroflexussp.strain-396-1 5.2 90.4±6.6 65–100 0.77 16.0 0.99 1.91 4.63 4.62 Ca.C.thermophilum 3.7 78.5±11.0 70–100 9.66 2.15 10.4 6.69 8.10 9.23 C.thalassium 3.29 63.4±5.3 50–100 6.46 2.11 11.6 3.48 8.41 7.65 T.roseum 2.93 64.0±12.7 75–100 0.21 0.66 0.03 0.15 0.21 0.05 T.thermophilus 2.11 73.6±11.6 75–100 0.08 0.80 0.22 0.27 0.35 0.13 T.yellowstonii 2.00 73.4±8.0 75–100 0.46 0.15 0.04 0.12 0.11 0.03 Abbreviation:%NTID,percentnucleotideidentity. aAllresultsmetthecriterionofhavingane-valuemoresignificantthan10(cid:2)10fortheWU-BLASTNparametersM¼3andN¼(cid:2)2,andadatabase sizeof68Mb. bRelativetothereferencegenome. cTheabbreviationsforthefourmetagenomesareasfollows:MS,MushroomSpring;OS,OctopusSpring;low,601Caverage;high,651Caverage. dRecruitedbytheSynechococcussp.strain-Agenomewith83–92%NTID. sequences to known taxa or novel phylogenetic previously (Steunou et al., 2006, 2008; Bhaya et al., clusters bycombiningthese approaches;31%ofthe 2007). Most (86%) of these metagenomic sequences metagenomic sequences are currently of unknown werejointlyrecruitedandweremorecloselyrelated origin. Consistent with the failure to detect 18S to either the Synechococcus sp. strain-A or B0 rRNA sequences at these temperatures (Liu et al., genome(SupplementaryFigure8).Thecyanobacterial 2011), no eukaryotic sequences were observed. scaffolds in these bins accounted for 19.7% of the Asideofarelativeunderrepresentationofsequences total assembled sequence data (Table 2), which was fromCa.C.thermophilum(SupplementaryFigure4), the largest amount assigned to any particular group pyrosequencing of SSU rDNA amplicons from of organisms. Differences between these cyano- environmentalDNAshowedtaxonomicprofilesthat bacterial scaffolds and the Synechococcus spp. weresimilartothoseforcDNAsequencesproduced isolate genomes were found and provide evidence from rRNAs for the meta-transcriptome studies (Liu for functional diversity. Scaffolds from native et al., 2011). Sequences likely originating from Synechococcus sp. strain-A-like populations con- archaeawerepresent,butwerenotinhighabundance tained genes encoding feoAB (involved in Fe2þ in the upper photic layer of these mats. transport) and genes homologous to the character- ized bacterial enzymes urea carboxylase (ureA) and allophanate hydrolase (atzF; involved in the degra- Major populations and their functional potential dation of urea into ammonia and CO ), both of 2 Clusteringonthebasisofoligonucleotidefrequency which are not found in the Synechococcus sp. showedeightscaffoldclusters(Figure1andTable1). strain-A genome (Supplementary Table 9) (Kana- Phylogenetic affiliations of these clusters were mori et al., 2004; Cheng et al., 2005). inferred from (i) direct co-clustering with reference (ii) Filamentous Anoxygenic Chlorophototrophs. genomes (Figure 1); (ii) clusters being composed of Cluster-2 scaffolds had similar oligonucleotide sequences recruited by a reference genome at high frequencies to both Roseiflexus sp. strain RS1 and % NT ID (Figure 2 and Table 2 and Supplementary Roseiflexus castenholzii genomes, and they predo- Table 7) and (iii) the presence of phylogenetically minantly comprised sequences recruited by the informative marker genes within the clusters Roseiflexus sp. strain RS1 genome (98%, with a (Figure 1 and Table 3). The metabolic potentials of meanof95% NTID; Supplementary Table7). Many organisms associated with these clusters were conserved phylogenetic marker genes, with inferred from the functional genes they contained. sequences almost identical to homologs in the (i) Oxygenic Chlorophototrophs. Cluster-1 con- Roseiflexus sp. strain RS1 genome, were found on tained scaffolds that were strongly associated with Cluster-2 scaffolds (Table 3). Most of the Cluster-2 the Synechococcus spp. strains A and B0 genomes, sequences were jointly recruited by the Roseiflexus and included cyanobacterial phylogenetic marker sp. strain RS1 genome with more than 80% NT ID genes and functional genes that were indicative of (Figure 2), which was above the mean from oxygenic photosynthesis, the Calvin–Benson– a comparison of Roseiflexus sp. strain RS1 and Bassham cycle and genes involved in nitrogen R. castenholzii homologs (Supplementary informa- and phosphorus acquisition that were described tionSection3).Thisobservationimpliesthatalarge TheISMEJournal Metagenomesofhotspringphototrophiccommunities CGKlattetal 1266 Figure 1 A network map of the core scaffold clusters observed in the Celera assemblies. Scaffolds with similar oligonucleotide frequencyprofilesthatgrouptogetherinthesameclusterareconnectedbylinescoloredtoindicatethepercentageoftimestheycluster together(inX90%of100trials).Theisolategenomesincludedinthisanalysisareindicatedbylargewhitecircles,whereasmetagenomic scaffoldsthatcontaincharacterizedphylogeneticmarkergenesareindicatedbymedium-sizedcirclescoloredaccordingtotaxonomic grouping.Theareaofeachellipseisproportionaltotheamountofmetagenomicsequencedatacontainedwithineachrespectivescaffold cluster. proportionofscaffoldsarerepresentedbysequences homologs of bidirectional [NiFe]-hydrogenases fromadiverseassemblageofRoseiflexusspp.,andis (hydAB) (Table 3; van der Meer et al., 2010). One consistent with the diversity of sequences directly open reading frame homologous to a nifH gene in recruited by the Roseiflexus sp. strain RS1 genome the Roseiflexus sp. strain RS1 genome was also by BLASTN independently of a metagenomic observed. assembly (Figure 2). One scaffold in Cluster-2 Oligonucleotide compositions of Cluster-3 scaf- contained a diagnostic fused pufLM gene that folds were not similar to any sequenced isolate encodes both of the type-2 photosystem reaction genomes above the 90% cutoff; however, the center polypeptides (pufL and pufM are character- phylogenetic and functional marker genes they istically fused in Roseiflexus spp.; Youvan et al., contained indicated that these scaffolds were con- 1984; Yamada et al., 2005) (Figure 3). There were tributed by Chloroflexus spp. Most (82%) of the recAsequenceshighly similartotheRoseiflexus sp. metagenomic sequences comprising these scaffolds strain RS1 recA in themetagenome (Supplementary were recruited at a high degree of similarity Figure 10), but these were not encoded on the large (Supplementary Table 7) by the genome of Chloro- scaffolds included in the cluster analysis. Suggest- flexus sp. strain 396-1, which is currently the most ing that these organisms have the capability to fix representative cultivated organism compared with inorganic carbon, Cluster-2 also contained eight the native Chloroflexus spp. in these mats (van der open reading frames homologous to Roseiflexus Meer et al., 2010). Most (85%) of the metagenome spp. genes encoding key enzymes in the 3-hydro- sequences recruited by the Chloroflexus sp. strain xypropionate pathway (Klatt et al., 2007). Like 396-1 genome were jointly recruited sequences that Roseiflexus sp. strain RS1, Roseiflexus spp. native had a mean % NT ID of 91.3±5.3% (Figure 2). One to the mat may have the potential to use H as an Cluster-3 scaffold contained a pufC homolog 2 electron donor because Cluster-2 scaffolds contain adjacent to bchP and bchG, consistent with the TheISMEJournal Metagenomesofhotspringphototrophiccommunities CGKlattetal 1267 Figure 2 Histograms of disjointly recruited (green), jointly recruited syntenous (red) and jointly recruited, non-syntenous (blue) metagenomicsequencesthatcanbeassociatedconfidentlywithareferencegenomepresentedasafunctionoftheir%NTIDrelativeto thereferencegenomesthatrecruitedthemintheBLASTNanalysis.%NTID,percentnucleotideidentity. Chloroflexus sp. 396-1 genome (93% NT ID, 100% known isolates with respect to light-harvesting amino-acid identity) (Figure 3). Overlapping strategies (Bryant and Frigaard, 2006; Frigaard and metagenomesequencesweremissingupstreamfrom Bryant, 2006; Bryant et al., 2011) (Table 3). the pufC open reading frame, so it could not Sequences encoding two key enzymes in the be confirmed whether the native Chloroflexus 3-hydroxypropionate pathway, and most closely spp. have the pufBAC operon structure observed in related to homologs in the Chloroflexus sp. strain other Chloroflexus spp. (Watanabe et al., 1995). 396-1 genome, were present on Cluster-3 scaffolds. However, the colocalized bchG and bchP genes and This suggests that Chloroflexus spp. in the mats high % NT ID to Chloroflexus sp. 396-1 are may be capable of carbon fixation by the 3-hydro- consistent with this inference derived from oligo- xypropionate pathway. Cluster-3 contained a nucleotide clustering (Figure 3). Homologs of genes homolog of sulfide-quinone oxidoreductases (sqr) involved in both BChl-c and BChl-a biosynthesis in Chloroflexus spp., which suggested that these were present in Cluster-3, indicating that the native organisms might oxidize sulfide to polysulfides Chloroflexus spp. are physiologically similar to (Bryant et al., 2011). TheISMEJournal Metagenomesofhotspringphototrophiccommunities CGKlattetal 1268 Table3 Phylogeneticmarkergenesandfunctionalgenesinassemblyclusters Cluster:phylogeny Phylogeneticgenesa Functionalgenesb Pathways/functions 1.Synechococcusspp. 16SrRNA chlB,chlG,chlJ,psaA,psaL,psbB,psbC Oxygenicchlorophototrophy pyrG cpcABCDEFG,apcABC,nblA recA rbcX,cbbS,cbbL,FBPaldolase,PRK Calvin–Benson–Basshamcycle rplB,rplC,rplD,rplE,rplF,rplK, ctaC,ctaD,ctaE Oxygenrespiration rplL,rplN,rplM,rplP,rplT nifH Nitrogenfixation rpoB narB Nitratemetabolism rpsB,rpsE,rpsI,rpsJ,rpsK, amtB Ammoniumtransport rpsM,rpsS pstABCS Phosphatetransport smpB phnCEGHILJ Phosphonatemetabolism tsf 2.Roseiflexusspp. 16SrRNA bchB,bchC,bchD,bchE,bchF,bchG,bchH-1, BChl-abiosynthesis nusA bchH-2,bchI,bchJ,bchL,bchM,bchN,bchP, rplA,rplB,rplC,rplD,rplE, bchT-like,bchX,bchY,bchZ rplN,rplP,rplT pufB,pufC,pufLM Anoxygenicchlorophototrophy rpsB,rpsC,rpsI,rpsS cyoB,coxA,coxB Oxygenrespiration tsf mch,mcl,mcr,mct,meh,pcs,smtA,smtB 3-Hydroxypropionatepathway hydAB Hydrogenase amtB Ammoniumtransport 3.Chloroflexusspp. infC bchB,bchG,bchH-1,bchL,bchN,bchP,bchS, BChl-aandBChl-cbiosynthesis nusA bchY,slr1923-homolog pgk mcr 3-Hydroxypropionatepathway recA pcs,pufC Anoxygenicchlorophototrophy rplT cyoB,coxC Oxygenrespiration sqr ReducedSoxidation hydAB Hydrogenase pstABCS Phosphatetransport 4.Ca. 16SrRNA bchD,bchE,bchF,bchL,bchU,acsF,bchI, BChl-aandBChl-cbiosynthesis C.thermophilum-like dnaG bchM,bchX,bchB,bchN,bchK,bchC,bchZ, organisms frr bchY,bchG,bchP nusA csmA Chlorosomebiosynthesis pgk amtB Ammoniumtransport pyrG recA rplE,rplM,rplT rpsB,rpsI smpB 5.Chlorobiales-like 16SrRNA bchB,bchC,bchD,bchF,bchG-homolog, BChl-aandBChl-dbiosynthesis organisms frr bchH,bchH-homolog,bchI,bchK,bchL, rplB,rplC,rplD,rplF,rplK,rplL, bchP,bchR,bchX,bchY,bchZ rplN,rplP,rplS,rplT fmoA Anoxygenicchlorophototrophy rpsE,rpsI,rpsJ,rpsK,rpsM pscA,pscB,csmC smpB norE-likecyt.coxidase Oxygenrespiration tsf amtB Ammoniumtransport 6.Anaerolineae-like 16SrRNA bchF,bchG,bchI,bchP,bchS-like,bchX, BChl-abiosynthesis organisms frr bchY,bchZ infC pufL,pufM,pufC Anoxygenicchlorophototrophy pgk coxA,coxB Oxygenrespiration pyrG sqr ReducedSoxidation recA rplA,rplK,rplL,rplM,rplT rpoB rpsI,rpsM smpB tsf 7.UnknownCluster-1 dnaG glcD,glcE Glycolateoxidation rplE,rplN,rplP,rpsC,rpsK, acs Acetatemetabolism rpsM,rpsS coxA,coxB,coxC Oxygenrespiration 8.UnknownCluster-2 dnaG coxA,coxB,coxC Oxygenrespiration infC pgk pyrG recA rplB,rplC,rplK,rplL,rplM, rplN,rplP rpsC,rpsE,rpsJ,rpsM aPhylogeneticmarkergenesidentifiedusingAMPHORAand/orphylogeneticanalysis. bFunctional genes identified using BLAST and annotated using hidden Markov models, genomic context and/or phylogenetic analysis (seeSupplementaryInformationSection3). TheISMEJournal Metagenomesofhotspringphototrophiccommunities CGKlattetal 1269 Figure3 ThePufLandPufMphylogenyandgenomiccontext.Theneighbor-joiningphylogenetictreeofthePufLandPufMsequences from a novel Chloroflexi metagenomic scaffold from Cluster-6 and from sequenced genomes; asterisks at nodes indicate bootstrap support 450% (1000 replications). A more detailed tree is shown in Supplementary Figure 12. The genomic context of the genes encoding the type-2 reaction center and the light-harvesting polypeptides in the metagenomic scaffolds and chromosomes of ChloroflexusandRoseiflexusisolatesisalsoshown.Thejaggedlinesindicatepositionsonscaffoldsthatareinterruptedbyalackof overlappingsequencedatabetweencontigs. (iii) Ca. Chloracidobacterium spp. Cluster-4 con- whereas there are native mat organisms nearly tainedfivescaffoldscontainingphylogeneticmarker identicaltotheCa.C.thermophilumisolateatsome genes with best matches to Kingdom Acidobacteria loci(Figure2),mostCluster-4sequencesarederived (including a recA sequence labeled ‘RecA Cabt’ in fromorganismsthataremoredistantlyrelatedtoCa. Supplementary Figure 10). These scaffolds had C. thermophilum than any species belonging to the distinct oligonucleotide frequency patterns as com- same genera that we investigated (Supplementary pared with the Ca. C. thermophilum genome, of information Section 3). The high proportion of which a detailed analysis will be published sepa- syntenous, jointly recruited metagenome sequences rately (A Garcia Costas, Z Liu, L Tomsho, SC from thegenomerecruitment analysis was evidence Schuster, DM Ward and DA Bryant, submitted), for conservation of synteny within this population, despite the fact that 97% of the sequences from which probably contributed in part to the longer- thesescaffoldswererecruitedbythisgenomewitha than-average assemblies. mean of 82.5% NT ID (Figure 2 and Supplementary (iv) Chlorobiales-Like Organisms. Cluster-5 scaf- Table 7). Genes involved in BChl and chlorosome folds had oligonucleotide frequency signatures biosynthesiswereobservedonthesescaffolds,anda similar to that of the Chloroherpeton thalassium gene encoding a type-1 photosynthetic reaction genome (Figure 1) and contained phylogenetic center gene (pscA) was observed when the cluster- marker and functional genes (Table 3) that are ing stringency was lowered to 80%. Although the typicalofmembersoftheChlorobiales.Thegenome number of Cluster-4 scaffolds was small, these ofC.thalassiumrecruited8.4%ofthemetagenomic scaffolds were the largest produced by the Celera sequences across all temperature–spring combina- assembler (the average size was 4300000bp, and tions, most of which were from low-temperature the largest was 1.6Mb; see Table 1). The Ca. samples and were disjointly recruited (Table 2 and C. thermophilum genome recruited 8.1% of all Figure 2). Although they were not found on unassembled metagenome sequences, 90.8% of scaffolds 420kb, many recA sequences were which were jointly recruited (Figure 2). The % NT recruitedthat,liketheC.thalassiumrecAsequence, ID distribution of these sequences suggested that, form an out-group to the clade that contains the TheISMEJournal Metagenomesofhotspringphototrophiccommunities CGKlattetal 1270 well-characterized chlorophototrophs in the order and pufM sequences homologous to Chloroflexi Chlorobiales(SupplementaryFigure10).The63.4% sequences but in a unique genomic context mean NT ID to C. thalassium homologs was (Figure 3). Phylogenetic analysis of the PufL and approximately equal to the % NT ID of homologs PufM sequences showed that, in comparison with belongingtodifferentgenerawithinakingdom-level those of known filamentous anoxygenic chloro- lineage (Figure 2 and Supplementary information phototrophs in the Chloroflexales, these sequences Section 3). Hence, phylogenetic information alone occupynoveland/orbasalpositionsinaphylogenetic did not provide high confidence that these tree (Figure 3 and Supplementary Figure 12). When sequences were derived from members of the compared to their closest homologs in the Chloro- Chlorobiales. Functional genes found on the flexusandRoseiflexusspp.genomes,thesePufLand scaffolds of this cluster clarified the potential PufM sequences had amino-acid identities of 48% physiological properties of this population. In and 62%, respectively. particular, one scaffold contained a gene encoding Assembly-independentBLASTNanalysisshowed a homolog of the Fenna–Matthews–Olson protein, thatthe metagenome sequences comprising Cluster-6 whichisaBChl-a-bindingantennaproteininvolved scaffolds had low % NT ID (60–66%) to the in anoxygenic photosynthesis and only known to Chloroflexi genomes. Approximately 33% of the occur in the members of the Chlorobiales and sequences comprising the Cluster-6 scaffolds were chlorophototrophic Acidobacteria (Bryant et al., not recruited by any reference genome above the 2007, 2011). High-performance liquid chromato- established cutoffs, and thus were ‘null’ bin graphic analysis of pigments extracted from the sequences (see Supplementary Table 7). Mushroom Spring mat will be published elsewhere (vi) Novel Putatively Chemoorganotrophic Popu- (M Pagel and DA Bryant, unpublished data), lations. The scaffolds in Clusters 7 and 8 did not but preliminary results indicated the presence of have oligonucleotide frequencies similar to any BChl-d, which replaces BChl-c as the aggregated tested isolate genomes, and contained functional pigments within the chlorosomes of ‘green’ chloro- and phylogenetic marker genes (including ‘RecA 7’ phototrophicChlorobiales.OtherCluster-5scaffolds in Supplementary Figure 10) with very distant contained homologs of the reaction center subunit relationships to sequences in currently available gene pscA (‘OS GSB PscA’; Bryant et al., 2007), public databases. Most metagenomic sequences pscB, pscD as well as csmC, a gene encoding a contained in these scaffolds were not recruited by chlorosome envelope protein that has no homologs a reference genome above the specified cutoff and in other chlorosome-containing chlorophototrophs were assigned to the ‘null’ bin, but some sequences and thus is currently diagnostic for Chlorobiales wererecruitedatlow%NTIDbymultiplegenomes (Bryant et al., 2011). (Supplementary Table 7). Clusters 7 and 8 did not (v) A Novel Anaerolineae-Like Chlorophototroph. contain any genes homologous to those specific for Cluster-6 scaffolds were not similar in oligonucleo- chlorophototrophy. Both clusters contained genes tide composition to any isolate genome, but encoding caa -type cytochrome c oxidases, which 3 contained phylogenetic marker genes associated suggested the potential for aerobic oxidative phos- with bacteria from Kingdom Chloroflexi (Figure 1). phorylation to exist in the organisms contributing The RDP Bayesian Classifier assigned a full-length these sequences. Cluster-7 additionally included 16S rRNA sequence in this cluster to the taxonomic scaffolds encoding glycolate oxidase (glcD) and class Anaerolineae with 95% confidence, and this acetyl-CoA synthetase (acs) genes (Table 3). Thus, observationwassupportedbyphylogeneticanalysis the organisms contributing these sequences may (see Supplementary Figure 11). Furthermore, genes have the potential for aerobic chemoorganotrophy encoding ribosomal proteins and recA genes using glycolate and/or acetate as an electron donor. (Table3)supportedthiskingdom-levelphylogenetic No assembly clusters corresponded to organisms assignment. In particular, a recA gene associated related to Thermomicrobium roseum, Thermus with assembly Cluster-6 (‘RecA 6’; Supplementary thermophilus or Thermodesulfovibrio yellowstonii, Figure 10) is phylogenetically earlier diverging but the genomes of these isolates recruited than the monophyletic clade containing known sequences above 75% NT ID (Table 2 and Figure 2). chlorophototrophic Chloroflexales (for example, Allotherreferencegenomesrecruitedalownumber Roseiflexus and Chloroflexus spp.). Several genes of sequences with low % NT ID values (Supple- involved in anoxygenic chlorophototrophy were mentary Figure 13). Approximately 20% of metage- encoded onthesamescaffoldasthe16S rRNAgene nomic sequences could not be associated with any in Cluster-6. This cluster also contained bchXYZ reference genome above an e-value cutoff of 10(cid:2)10 genes encoding the subunits of the light-indepen- usingthespecifiedparametersandwereassignedto dent chlorophyllide reductase, an enzyme required the ‘null’ bin. for the biosynthesis of BChl-a (Chew and Bryant, 2007), as well as other BChl biosynthesis genes (bchD, bchF, bchH and bchI) common to the BChl-a Patterns of metagenomic diversity and BChl-c biosynthetic pathways. A separate (i) Multiple Populations in Recruitment Bins. scaffold in this cluster contained non-fused pufL Recruitment analysis of the metagenomic clones TheISMEJournal Metagenomesofhotspringphototrophiccommunities CGKlattetal 1271 from the 651C Mushroom Spring sample showed at between the Synechococcus sp. strain-A and B0 least two populations, one with 492% NT ID and genomes was nearly identical to that observed one with 83–92% NT ID, relative to the Synecho- empirically, but synteny between the Synechococ- coccus sp. strain-A genome (Figure 4). The more cus sp. strain-A genome and the more distantly divergent sequences were likely contributed by relatedgenomeswasalmostundetectable(Figure5). A0-like Synechococcus spp., as they showed 498% NT ID with homologs in a metagenome Evidence for homologous recombination. Meta- produced by pyrosequencing from a 681C sample genomic clones, whose disjointly recruited ends from Mushroom Spring, known to be dominated by can each be confidently associated with different these genotypes (Supplementary information Sec- reference genomes, provided evidence for possible tion 6 and Supplementary Figures 3 and 14; Ferris past gene exchange between A-like Synechococcus et al., 2003). These accounted for only 1.57% of the spp. and members of the Synechococcus A0 and B0 A-like sequences in all metagenomes (Table 2). lineages,aswellasbetweenthesecyanobacteriaand (ii) Synteny versus Relatedness. There was a filamentousanoxygenicchlorophototrophsorCa.C. positive relationship between the degree of genetic thermophilum. The relative percentage of clones, relatedness and the conservation of synteny in both whose end sequences could be confidently asso- metagenomic sequences and genomic reference ciatedwithSynechoccoccussp.strain-Aononeend sequences as compared with Synechococcus sp. and with other populations on the other end, strain-A (Figure 5). Metagenomic sequences origi- decreased from 26% for all A0-like sequences (that nating from A-like organisms (that is, X92% NT ID is,83–92%NTIDtoSynechococcussp.strain-A;no with the Synechococcus sp. strain-A genome) isolate genomeis availablefrom thisorganismtype) showed greater synteny with respect to the Syne- to 4.5% for all Synechococcus sp. strain-B0-like chococcus sp. strain-A genome than did sequences sequences (that is, 490% NT ID to Synechococcus associated with A0-like organisms (that is, 83–92% sp. strain-B0), to1.1%forsequences associatedwith NT ID with the Synechococcus sp. strain-A gen- a more distantly related cyanobacterial reference ome), which in turn showed higher synteny than genome (that is, Thermosynechococcus elongatus did B0-like sequences (that is, comparing sequences BP-1)andto0.2%forsequencesassociatedwithyet that had X90% NT ID to the Synechococcus sp. more distantly related genomes (that is, Roseiflexus strain-B0 genome with homologs in the Synecho- sp. strain RS1, Chloroflexus sp. strain 396-1 or Ca. coccussp.strain-Agenome).Toassesssyntenywith C. thermophilum). Many of these disjointly re- more distantly related isolate genomes, we com- cruited metagenome sequences encoded CRISPR- pared paired-end sequences of simulated meta- associated proteins putatively involved in adaptive genomic fragments (comprising sequence fragments responses to phage predation. Some recombination from representative cyanobacterial isolate genomes events among cyanobacteria and more distantly fractionated to reflect the range of sizes and the related organisms may thus be indicative of phage– abundances of our Sanger metagenome clone in- hostinteractions(SupplementaryTable9;Heidelberg serts) with the Synechococcus sp. strain-A genome etal.,2009).Otherdisjointlyrecruitedcyanobacterial (Supplementary information Section 3). Synteny sequences encoded transposases on the linked Figure 4 Position of alignments and the corresponding % NT ID to the Synechococcus sp. A genome of syntenous (red) and non-syntenous(blue)sequencesjointlyrecruitedbytheSynechococcussp.AgenomefromtheMushroomSpringB651Cmetagenome. Eachendsequenceisconnectedbyalinetoitsclonemate.SequencessuspectedtooriginatefromSynechococcussp.A0-likepopulations rangingfrom83to92%NTIDareindicatedontherightsideofthegraph.%NTID,percentnucleotideidentity. TheISMEJournal
Description: