TheISMEJournal(2014)8,1440–1451 &2014InternationalSocietyforMicrobialEcology Allrightsreserved 1751-7362/14 www.nature.com/ismej ORIGINAL ARTICLE Single-cell enabled comparative genomics of a deep ocean SAR11 bathytype J Cameron Thrash1,2, Ben Temperton1, Brandon K Swan3, Zachary C Landry1, Tanja Woyke4, Edward F DeLong5,6, Ramunas Stepanauskas3 and Stephan J Giovannoni1 1Department of Microbiology, Oregon State University, Corvallis, OR, USA; 2Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA; 3Bigelow Laboratory for Ocean Sciences, East Boothbay, ME, USA; 4DOE Joint Genome Institute, Walnut Creek, CA, USA; 5Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA and 6Center for Microbial Ecology: Research and Education, Honolulu, HI, USA Bacterioplankton of the SAR11 clade are the most abundant microorganisms in marine systems, usuallyrepresenting25%ormoreofthetotalbacterialcellsinseawaterworldwide.SAR11isdivided into subclades with distinct spatiotemporal distributions (ecotypes), some of which appear to be specifictodeepwater.HereweexaminethegenomicbasisfordeepoceandistributionofoneSAR11 bathytype (depth-specific ecotype), subclade Ic. Four single-cell Ic genomes, with estimated completeness of 55%–86%, were isolated from 770m at station ALOHA and compared with eight SAR11surfacegenomesandmetagenomicdatasets.SubcladeIcgenomesdominatedmetagenomic fragment recruitment below the euphotic zone. They had similar COG distributions, high local syntenyandsharedalargenumber(69%)oforthologousclusterswithSAR11surfacegenomes,yet were distinct at the 16S rRNA gene and amino-acid level, and formed a separate, monophyletic group in phylogenetic trees. Subclade Ic genomes were enriched in genes associated with membrane/cell wall/envelope biosynthesis and showed evidence of unique phage defenses. The majority of subclade Ic-specfic genes were hypothetical, and some were highly abundant in deepoceanmetagenomicdata,potentiallymaskingmechanismsfornichedifferentiation.However, theevidencesuggeststheseorganismshaveasimilarmetabolismtotheirsurfacecounterparts,and thatsubcladeIcadaptationstothedeepoceandonotinvolvelargevariationsingenecontent,but rather more subtle differences previously observed deep ocean genomic data, like preferential amino-acid substitutions, larger coding regions among SAR11 clade orthologs, larger intergenic regions andlarger estimated averagegenome size. The ISME Journal(2014)8, 1440–1451;doi:10.1038/ismej.2013.243; publishedonline23 January 2014 Subject Category: Integrated genomics and post-genomics approaches in microbial ecology Keywords: bathytype; ecotype; metagenomics; SAR11; single-cell genomics; deep ocean Introduction necessarily being adapted to cold and increased pressure there, the deep sea also contains more Characterized by darkness, average temperatures of recalcitrant forms of carbon than at the surface approximately 2–41C, increased hydrostatic pres- (Arıstegui et al., 2009; Nagata et al., 2010; Robinson sure and general oligotrophy, the relatively extreme et al., 2010). Cultivated isolates have revealed some environment of the deep ocean is also the largest microbial adaptations associated with life at depth, biome on the Earth. The mesopelagic (200–1000m) includingincreasedintergenicspacerregions,rRNA and bathypelagic (1000–4000m) zones contain gene indels and higher abundances of membrane 470% of marine microbial biomass (Arıstegui polyunsaturated fatty acids and surface-adhesion/ et al., 2009) and these organisms have vital roles motility genes (Simonato et al., 2006; Lauro and in global cycling of carbon, nitrogen and other Bartlett,2008;Wangetal.,2008;Nagataetal.,2010). biogeochemical processes (Nagata et al., 2010; However, many of the most abundant bacterial Robinsonetal.,2010).Inadditiontomicroorganisms groupsfromthedeepoceanremainuncultivated,for example, the SAR202, SAR324 and SAR406 clades, which make up significant fractions of microbial Correspondence: JC Thrash, Department of Biological Sciences, communities at depth (Giovannoni et al., 1996; LouisianaStateUniversity,BatonRouge, LA,USA. Gordon and Giovannoni, 1996; Wright et al., 1997; E-mail:[email protected] DeLongetal.,2006;Morrisetal.,2006;Varelaetal., Received 13 August 2013; revised 7 November 2013; accepted 10December2013;publishedonline23January2014 2008;Schattenhoferetal.,2009;Treuschetal.,2009; DeepwaterSAR11bathytypegenomics JCThrashetal 1441 Morris et al., 2012). Thus, it remains uncertain how of predicted sugar transporters. As subclade Vorgan- widespread the known adaptations of cultivated ismsbloomatthesurfaceconcurrentlywiththemore isolates are among deep ocean microorganisms. numerically dominant subclade Ia ecotype (Vergin Metagenomic analyses have provided evidence for et al., 2013), genetic machinery for the oxidation of commongenomicfeaturesinthedeepocean,suchas sugars may provide a means of niche differentiation. increased proliferation of transposable elements and A recent study has pointed toward a deep SAR11 phage, amino-acid content changes, and increased bathytype (depth-specific ecotype (Lauro and average genome size (DeLong et al., 2006; Bartlett, 2008)), phylogenetically distinct from the Konstantinidis et al., 2009). Single-cell genomic currently cultivated strains. This ‘subclade Ic’ was analyses provide another powerful means to under- represented by a single 16S clone library sequence stand the metabolism and evolution of organisms that preferentially recruited pyrosequencing reads eluding cultivation-based techniques (Stepanauskas, from depths of 200m and below at the Bermuda 2012;Blainey,2013;Lasken,2013;Rinkeetal.,2013). Atlantic Time-series Study site (BATS; Vergin et al., This approach provided the first insight into the 2013), and formed a monophyletic group with 16S metabolism of several of these deep ocean clades, sequences from single-cell genomes collected at including SAR324, Arctic96BD-19 and Agg47, and 770m at Station ALOHA. Here we present a made the important discovery that at least some of comparative analysis of subclade Ic utilizing four these organisms are capable of chemoautotrophy single-amplified genomes (SAGs), metagenomes (Swan et al., 2011). The findings from single-cell from euphotic, meso-, bathy- and hadopelagic genomics are consistent with widespread autotrophy samples and eight pure-culture SAR11 genomes genesinotherdominantdeepoceanmicroorganisms, from three surface subclades. We tested the hypoth- suchastheThaumarchaea(Karneretal.,2001;Pester esis that the subclade Ic genomes would have et al., 2011), and direct measurements of high levels featuresthatdistinguishthisbathytypefromsurface of carbon fixation in the meso- and bathypelagic organisms to yield a better understanding of SAR11 zones (Reinthaler et al., 2010). adaptationstotheoceaninteriorandofthegenomic Another abundant group of microorganisms that basis for SAR11 subclade differentiation by depth. populates the deep ocean is SAR11. Bacterioplank- ton of the SAR11 clade are the most numerous in Materials and methods marine systems, typically comprising B25% of all prokaryotic cells (Morris et al., 2002; Schattenhofer Comparative genomics et al., 2009). Although the majority of research has Single-cell separation, multiple displacement ampli- focused on the SAR11 clade in the euphotic and fication(MDA),qualitycontrolandSAGselectionfor upper mesopelagic zones, multiple studies have sequencing based on MDA kinetics were all carried demonstrated evidence of substantial SAR11 popu- outasdescribedpreviously(Swanetal.,2011).More lations deeper in the mesopelagic, as well as in the detaileddescriptionsareavailableinSupplementary bathy- and even hadopelagic (46000m) realms Methods.SequencingandassemblyoftheSAGswere (Martin-Cuadrado et al., 2007; Konstantinidis carriedoutbytheDOEJointGenomeInstituteaspart et al., 2009; Schattenhofer et al., 2009; Quaiser ofaCommunitySequencingProgramgrant2011-387. et al., 2010; Swan et al., 2011; Eloe et al., 2011a, b; The SAG Whole Genome Shotgun projects have King et al., 2013). been deposited at DDBJ/EMBL/GenBank under the SAR11, or the ‘Pelagibacterales,’ is a diverse group, accession numbers: AZHR00000000 (AAA240-E13), spanningatleast18%16SrRNAgenedivergence,and AZHQ00000000 (AAA288-E13), AZYB00000000 iscomprisedofsubcladeswithuniquespatiotemporal (AAA288-G21) and AZYC00000000 (AAA288-N07). distributions (ecotypes) that follow seasonal patterns The versions described in this paper are versions (Field et al., 1997; Carlson et al., 2009; Giovannoni AZHR01000000 (AAA240-E13), AZHQ01000000 and Vergin, 2012; Grote et al., 2012; Vergin et al., (AAA288-E13), AZYB01000000 (AAA288-G21) and 2013). All genome-sequenced representatives are AZYC01000000(AAA288-N07).Genomeannotations characterized by small (1.3-1.4Mbp), streamlined can be accessed using the Integrated Microbial genomes with low GC content, few gene duplications Genome (IMG) database (http://img.jgi.doe.gov). and an obligately aerobic, heterotrophic metabolism SAG gene orthology with other SAR11 genomes generally focused on oxidation of low-molecular- was completed using the Hal pipeline (Robbertse weight carbon compounds, such as carboxylic and etal.,2011) anda seriesof custom filters, described amino acids, osmolytes and methylated compounds in detail in Supplementary Methods. Post assembly (Schwalbach et al., 2010; Yilmaz et al., 2011; Carini quality control was assisted by examination of gene et al., 2012; Grote et al., 2012). Representatives conservation across SAR11 strains. SAG genome spanning the known subclade diversity have an completion was evaluated based on599 single-copy unusually high level of core genome conservation genes present in all eight pure culture SAR11 and gene synteny, however, some subclade-specific genomes. Overall, SAG genome completion percen- genomic features have been identified (Grote et al., tage was based on the percentage of these orthologs 2012). The subclade V representative, HIMB59, found in the SAGs (Supplementary Table S1). encodes a complete glycolysis pathway and a variety Average amino-acid identity and local synteny TheISMEJournal DeepwaterSAR11bathytypegenomics JCThrashetal 1442 between genomes were calculated with the scripts/ (2011), and hmmsearch (Eddy, 2011) using default methods of Yelton et al. (2011). Pairwise 16S rRNA settings. gene identity was calculated with megablast using All phylogenetic analyses, with the exception of default settings. COG distribution among SAR11 proteorhodopsin, were completed by aligning genomes was part of data supplied by IMG sequences with MUSCLE (Edgar, 2004) and comput- (Supplementary Table S1). Patterns of amino-acid ing trees with RAxML (Stamatakis, 2006; Stamatakis substitutionbetweensurfaceanddeep-waterstrains et al., 2008). Alignments for trees in Figures 1 and 5 of SAR11 were analyzed as described in were curated for poorly aligned sites using Gblocks Konstantinidisetal.(2009).Fold-changeabundance (Castresana,2000).ProtTest(Abascaletal.,2005)was of amino acids across similar and non-similar utilized to optimize amino-acid substitution model- substitutionswascalculatedfromallvsallBLASTP ingforprotein-codingtrees.Theconcatenatedprotein outputwithinhomologousclusters.Intergenicspacer phylogeny of the SAR11 clade was completed using regions were provided as part of the IMG anno- the Hal pipeline (Robbertse et al., 2011). The tation process. Distribution of intergenic regions proteorhodopsin tree was computed using the itera- was examined in R (http://www.R-project.org). tive Bayesian alignment/phylogeny program HandA- Transposable elements were assessed using lign (Westesson et al., 2012). Detailed methodology TBLASTN and the sequences collected by Brian for every tree, unaligned fasta files for each of the Haas of the Broad Institute for the program Transpo- singlegenetrees,andthesuperalignmentandmodel sonPSI (http://transposonpsi.sourceforge.net). Clus- file for the concatenated protein tree,areprovidedin teredregularlyinterspacedshortpalindromicrepeats the Supplementary Information. (CRISPRs) were detected as part of the automated IMG annotation process. A search for cas genes was Metagenomics conducted using 78 hidden Markov models (HMMs) DNA was extracted from microbial biomass developed by Haft et al. (2005) and Makarova et al. collected from BATS in August 2002 across a Figure1 Maximum-likelihoodtreeofthe16SrRNAgenefortheSAR11cladeinthecontextofotherAlphaproteobacteria.Genome sequencedstrainsareinbold,withsubcladeIcsequencesinredandotherSAR11sequencesinblue.Bootstrapvalues(n¼1000)are indicatedatthenodes;scalebarrepresentschangesperposition. TheISMEJournal DeepwaterSAR11bathytypegenomics JCThrashetal 1443 depth profile (0, 40, 80, 120, 160, 200 and 250m) and Normalized aggregate sequenced using454 pyrosequencing(GS-FLX,Roche, reciprocal best blast hits (%) Basel, Switzerland). Data is available at CAMERA 0 20 40 60 80 100 (https://portal.camera.calit2.net)underCAM_PROJ_ Ia Ic IIIaVa BATS. Metagenomes from ALOHAwere previously ALOHA 25m described in Shi et al. (2011). Data were also ALOHA 500m analyzed from 454 metagenomic sequences col- lected from Eastern Tropical South Pacific Oxygen ALOHA 770m Minimum Zone (Stewart et al., 2012), the Puerto ALOHA 1000m Rico Trench (Eloe et al., 2011a), the Sea of Marmara ETSP OMZ 15-110m (Quaiseretal.,2010)andtheMatapan-VavilovDeep in the Mediterranean Sea (Smedile et al., 2013). ETSP OMZ 200m All raw data were trimmed of low-quality end ETSP OMZ 500m sequences using Lucy (Chou and Holmes, 2001) ETSP OMZ 800m andde-replicatedusingCDHIT-454(Fuetal.,2012). Sanger-sequenced reads from 4000m at ALOHA BATS 0-80m (Konstantinidis et al., 2009) were also analyzed but BATS 120-160m not compared with the 454 pyrosequenced reads. BATS 200-250m GOS (Venter et al., 2004; Rusch et al., 2007; Brown et al., 2012) surface sequences were analyzed for PRT 6000m temperature dependence of subclade Ic abundance, MARM 1000m but also not included in gene relative abundance MATA 4908m normalizations (Supplementary Information). Comparativerecruitmentofmetagenomicsequences Figure 2 Relative abundance of SAR11 subclades based on wascompletedusingareciprocalbestBLAST(rbb;for reciprocalbestblastrecruitmentofmetagenomicsequences. example, Wilhelmetal.,2007)of eight SAR11 isolate genomes (HTCC1062, HTCC1002, HTCC9565, HTCC7211, HIMB5, HIMB114, IMCC9063, HIMB59) phylogenetically affiliated with a single clone and thefourSAR11 SAGs.Each concatenated SAR11 branching between SAR11 subclades Ia/Ib and sub- genomesequencewassearchedagainsteachmetagen- cladesIIa/IIb, termedsubclade Ic (Vergin et al., 2013; ome database with BLASTN. All hits to SAR11 Figure 1). Phylogenetic evaluation of SAR11-type genomes were then searched against the entire IMG SAG 16S rRNA gene sequences demonstrated a database (v400), containing the 12 SAR11 genome congruent topology, with a monophyletic group sequences using BLASTN. The best hits to each of SAGs collected from mesopelagic samples corre- genome after this reciprocal blast were then normal- sponding tothe subclade Ic position (Supplementary izedbygenelength,theaveragenumberofsequences FigureS1). Four SAGs were selected to represent the and relative abundance of SAR11 per sample. Taxo- breadth of the clade, determined by branch lengths nomic relative abundance for SAR11 and non-SAR11 (Supplementary Figure S1). The 16S rRNA gene organismswasestimatedwithmetagenomicbest-blast sequences from the SAGs formed a monophyletic hits to whole-genome sequences in the IMG v400 group with the subclade Ic clone from (Vergin et al., database. The results presented in Figure 2 represent 2013)basaltosubcladesIa/b(Figure1).AllfourSAGs anaggregationofallnormalizedmetagenomicrecruit- were isolated from a single station ALOHA sample ment for all genomes in a given subclade, divided by taken at 770m. the total number of SAR11 hits in that sample. Recruitment of metagenomic 454 pyrosequences Gene clusters that may putatively have a role in from Station ALOHA, the Eastern Subtropical depth adaptation in subclade 1c were identified Pacific oxygen minimum zone (ESTP OMZ) and as follows: metagenomic samples were classified as BATS indicated a higher relative abundance of ‘surface’ (o200m) or ‘deep’ (X200m) and gene subclade Ic in the mesopelagic compared to the cluster abundance in surface and deep samples was euphotic zone (Figure 2, Supplementary Figures determined by reciprocal best-BLAST. The R pack- S2–S4), and greater relative abundance in age DESeq (Anders and Huber, 2010) was used to the 6000m Puerto Rico Trench (PRT) metagenomic identify genes that were statistically significantly dataset compared with other subclades enriched at depth and at the surface. Detailed (Supplementary Figure S5). The Sea of Marmara workflowsforthemetagenomicanalysesareavailable (MARM) dataset showed similar distributions in Supplementary Information. between subclade Ia (predominantly HTCC1062 type) and Ic (Supplementary Figure S6), and Results and discussion althoughtheMatapan-VavilovDeep(MATA)dataset had very little recruitment to any SAR11 genome SubcladeIcrelativeabundanceinmetagenomicdatasets (Supplementary Figure S7), consistent with the Previous results demonstrated an abundance of previous analysis (Smedile et al., 2013), those upper mesopelagic 16S rRNA gene sequences sequences that did recruit to SAR11 genomes were TheISMEJournal DeepwaterSAR11bathytypegenomics JCThrashetal 1444 predominantly Ic-like. Longer Sanger shotgun- oxidase as the sole predicted terminal reductatse, a sequencing reads from 4000m at Station ALOHA completetricarboxylicacidcycle,conservedlesions (Konstantinidis et al., 2009) also demonstrated in several glycolytic pathways (Schwalbach et al., increased recruitment to the SAGs relative to other 2010), a reliance on reduced sulfur compounds genomes in deeper water (Supplementary Figure S8). (Trippetal.,2008)andpathwaysforthemetabolism We tested whether the increased abundance at depth and oxidation of small organic molecules such as mightbeduetotemperaturedependence.Recruitment amino/carboxylic acids and one-carbon and methy- fromtheGOSdataset(Venteretal.,2004;Ruschetal., lated compounds (Yilmaz et al., 2011; Grote et al., 2007;Brownetal.,2012)consistentlyshowedadearth 2012; Carini et al., 2012; Supplementary Table S1). of subclade Ic abundance relative to Ia in surface Also consistent with previous findings about the waters around the globe, and did not support the Pelagibacterales(Groteetal.,2012),theIcSAGshad conclusion that subclade Ic abundance at depth was an unusually high conservation of local synteny driven by temperature (Supplementary Information). among SAR11 genes (Figure 4). When compared among themselves, the Ic SAGs had less local synteny than most organisms at that level of 16S Comparisons with surface SAR11 genomes rRNA gene identity. However, we attributed this to TheSAGshadtotalassemblysizesbetween0.81and theSAGsbeingincompleteandfragmented,because 1.40Mbp spanning 81–151 scaffolds 4500bp, GC when the SAGs were compared with other SAR11 content between 29% and 30%, and coded for genomes, syntenic genes were a characteristically 948–1621 genes (Table 1). Estimated genome com- high percentage of the total shared genes. High pleteness, using 599 SAR11-specific single-copy amounts of local synteny may seem unlikely given orthologs (Supplementary Table S1), was between predictedSAR11recombinationratesareamongthe 55% and 86% with the corresponding estimated highest measured for prokaryotes (Vergin et al., averagegenomesizeforthesubcladeIcorganismsat 2007;VosandDidelot,2009),however,itwasshown 1.49±0.09Mbp.Protein-codingorthologousclusters previously that much of the rearrangement within fortheSAGsandeightisolateSAR11genomeswere genomes occurs at operon boundaries, and thus determined by all vs all BLASTP and Markov localsyntenyisnotdisrupted(Wilhelmetal.,2007). clustering using the automated pipeline Hal Further, the rates in Vergin et al. (2007) were (Robbertse et al., 2011) and custom filters for length restricted to closely related organisms within sub- and synteny. Of the 3158 total orthologous clusters clade Ia. in the 12 SAR11 genomes, 1764 (56%) were present Although gene content and local gene order in at least one SAG, and 69% of the orthologous conservation between the isolate genomes and the clusters found in the SAGs were shared with SAGs was high, the SAGs were distinct at the between one and eight other SAR11 genomes. COG amino-acidlevel.Aconcatenatedproteinphylogeny distributionamongtheSAGswasgenerallythesame using 322 single-copy orthologs supported the 16S asinsurfacegenomes,exceptforcategoriesMandP phylogeny, placing the subclade Ic SAGs as a (Figure3,SupplementaryFigureS9,seebelow).The monophyleticsistergrouptothesubcladeIasurface majority of Ic-specific genes were hypothetical strains (Figure 5a). The divergence from other (Supplementary Table S1), although several notable strains and the depth of branching within the Ic-specificgeneswerepresent(seebelow).Aswould subcladeIcsupportedconceptualizationofsubclade be expected from a low percentage of unique genes Ic as a new genus of SAR11, separate from the in the SAGs, much of the metabolism of these subclade Ia, or Pelagibacter genus (Grote et al., organisms appeared to be similar to that of the 2012).Comparisonofaverageamino-acididentityvs surface strains, particularly the subclade Ia organ- 16S rRNA gene identity was also in accordance isms. Collectively, the Ic subclades were predicted with the metrics proposed by Konstantinidis and to be obligate aerobic organisms, with cytochrome c Tiedje (2007) for delineation of genera (66%–72% Table1 SubcladeIcSAGgenomecharacteristics Genome AAA240-E13 AAA288-E13 AAA288-G21 AAA288-N07 OtherSAR11a Numberofscaffolds 151 106 139 81 — Assemblysize(Mbp) 1.40 0.81 0.91 0.95 — Est.genomecompleteness(%) 86 55 64 66 — Est.genomesize(Mbp) 1.62 1.48 1.43 1.44 1.29–1.41* GCcontent(%) 29 29 30 29 29–32 Numberofgenes 1621 948 1103 1110 1357–1576 Numberofgenes(prot.cod.) 1581 923 1074 1083 1321–1541 Abbreviations:IMG,IntegratedMicrobialGenome;SAG,single-amplifiedgenome. aValuesfromGroteetal.(2012)andIMG. *Actual(notestimated)sizes. TheISMEJournal DeepwaterSAR11bathytypegenomics JCThrashetal 1445 AAA280-E13 HTCC1062 HIMB5 isoleucine, lysine, asparagine, arginine and trypto- AAA288-E13 HTCC1002 HIMB114 15 AAA288-G21 HTCC9565 IMCC9063 phaninthepredictedsubcladeIcproteinsequences AAA288-N07 HTCC7211 HIMB59 at the expense of alanine, aspartatic acid, glutamic acid, methionine, glutamine, threonine and valine es (Figure 6, Supplementary Figure S10). n 10 ge Many of the previously reported features asso- of e ciated with deep-ocean adaptation in microorgan- g nta isms were not observed in the SAGs, such as rRNA e erc 5 geneinsertions,increased transposable elements,or P genes for chemoautotrophy (see Supplementary Information for detailed discussion). Nevertheless, 0 there were still some distinguishing characteristics E G D N M B H Z V C S R P U I F O L Q T K J between subclade Ic and surface strains at the Figure 3 COG distribution as a percentage of total genes whole-genomelevelthatweresimilartoormatching assigned to COGs. Y axis: percentage of genes, x axis: COG those previously observed in deep ocean metage- categories.Colorscorrespondtothegenomesaccordingtothekey. nomicdatasets(DeLongetal.,2006;Konstantinidis Asterisksindicatecategorieswithdifferentialdistributioninthe etal.,2009) and comparative genomicsstudies. The SAGs relative to the isolate genomes.E, amino-acid metabolism subclade Ic genomes had a small, but statistically andtransport;G,carbohydratemetabolismandtransport;D,cell division and chromosome partitioning; N, cell motility and significantly increase in intergenic space secretion; M, cell wall/membrane/envelope biogenesis; (Supplementary Figure S11) and a slightly (but B,chromatinstructureanddynamics;H,coenzymemetabolism; statistically insignificant) higher estimated average Z, cytoskeleton; V-, C, energy production and conversion; genome size than that of current surface genomes S, unknown function; R, general function prediction only; P,inorganiciontransportandmetabolism;U,intracellulartraffick- (1.49±0.09 vs 1.33±0.07 Mbp, Supplementary ingandsecretion;I,lipidmetabolism;F,nucleotidetransportand Table S1). Also, consistent with Konstantinidis et al. metabolism; O, posttranslational modification, protein turnover, (2009) and a general trend toward larger genomes chaperones; L, DNA replication, recombination and repair; Q, in deeper samples, there were more gaps in the secondary metabolite biosynthesis, transport and catabolism; surface strain ortholog alignments (Supplementary T,signaltransductionmechanisms;K,transcription;J,translation. Figure S10), indicating insertions and thus larger codingregionsinthesubcladeIcopenreadingframes. Unlike the surface strains, three of the four SAGs 1.0 HIMB114 vs. IMCC9063 showed a statistically significant enrichment in (subclade IIIa) category M, cell wall/membrane/envelope biogenesis subclade IIIa vs. subclades Ia/Ic HTCC1062 vs. (Figure 3 and Supplementary Figure S9). An increase 0.8 (n=18) HTCC1002 in COG M genes was previously noted in the deep n HTCC9565 vs. ocean Photobacterium profundum SS9 relative to ervatio 0.6 vs.H aIMll (Bn5=911) HHTTCCCC11000622 & m20e0s8o)phanildicinVibaridoeneapcewaeatesrtraeicnosty(pCeamofpaAnlaterroometonaal.s, s con othceorm spuabrcislaodnes Ia subclade Ic macleodii (Ivars-Martı´nez et al., 2008). COG M genes der (n=7) com(pna=r6is)ons enriched in the SAGs include glycosyltransferases, or 0.4 methyltransferases, sugar epimerases, a sialic acid e Gen subclade Ia vs. synthase, the cellular morphology gene ccmA (Hay subclade Ic et al., 1999) and polysaccharide export proteins (n=20) 0.2 (Supplementary Table S1). The SAGs also showed a significantreductionofCOGPgenesforinorganicion transport and metabolism that may reflect increased 0.0 relianceonorganicNandPsources.Insupportofthis hypothesis, none of the SAGs had homologs of the 0.2 0.4 0.6 0.8 1.0 phosphate metabolism genes phoU, pstS, pstA or Average normalized bit score pstC, and although they had predicted ammonia Figure 4 Local synteny in SAR11 genomes. The percentage of permeasesthatclusteredwithammoniumtransporters genes in conserved order relative to the total number of shared (clusters 150010.f.ok and 1500936.f.ok), none had genes(geneorderconservation)vsaveragenormalizedbitscoreof genes annotated as an ammonium transporter. the shared amino-acid content. Red dots are all pairwise Furthermore, the SAGs had unique genes for purine comparisons of SAR11 genomes, the total in a given area indicated by n. Data are overlaid on that from Yelton et al. degradationtoammonia(SupplementaryFigureS12), (2011;opengraycircles). including a 2-oxo-4-hydroxy-4-carboxy-5-ureidoimi- dazoline (OHCU)decarboxylasethatwasspecificto, andconservedin,allfourSAGs,possiblyindicatinga amino-acid identity; Grote et al., 2012; Figure 5b). clade-specific nitrogen salvage pathway. Specific amino-acid substitution patterns among There were also indications of unique phage orthologs shared between the SAGs and the surface interactions and defense mechanisms in subclade genomes showed relative increases in cysteine, Ic compared with the surface strains, consistent TheISMEJournal DeepwaterSAR11bathytypegenomics JCThrashetal 1446 16S rRNA gene identity (%) 99.9 99.9 99 98.5 95.2 93.6 95.5 94.5 88.3 88.6 82.1 HTCC1062 HTCC1062 98.4 99.8 98.9 98.4 95.1 93.7 95.4 94.4 88.3 88.5 82 HTCC1002 HTCC1002 %) 83.9 83.9 99 98.5 95.2 93.8 95.5 94.5 88.5 88.6 82 HTCC9565 Ia HTCC9565 y ( 75.5 75.9 76.2 98.7 94.9 93.5 95 94 88.1 88.8 82 HTCC7211 ntit HTCC7211 e 74.3 74 73.2 75.3 95.6 93.8 95.8 94.5 88.6 88.4 82 HIMB5 92 d HIMB5 cid i 63 63 62.2 62.3 62.2 98 98.6 98.7 89.8 88.1 84.7 AAA240-E13 a AAA288-N07 no 61.7 61.8 60.7 61.4 60.5 82.1 96.9 96.6 88 88.1 82.1 AAA288-E13 AAA288-G21 mi a 62.6 62.6 62.3 62.2 61.7 79.4 80.8 98 89.1 88.5 82.5 AAA288-G21 Ic e AAA288-E13 g a 62.8 62.8 61.9 62.1 62.3 80.6 80.7 82.4 88.7 88.1 82.4 AAA288-N07 r e AAA240-E13 v a 54.7 54.5 54.9 54.9 55.3 54.7 53.7 54.3 54.3 92.9 82.6 HIMB114 IIIa IMCC9063 56 56 55.6 55.6 55.3 55.5 54.6 55.6 54.9 63.6 82.2 IMCC9063 HIMB114 46.5 46 45.8 46.4 46.6 45.3 45.2 45.5 45.4 45.2 45.2 HIMB59 V HIMB59 2 2 5 1 5 3 3 1 7 4 3 9 6 0 6 1 B 1 1 2 0 1 6 5 0.09 0 0 5 2 M E E G N 1 0 B TCC1 TCC1 TCC9 TCC7 HI A240- A288- A288- A288- HIMB MCC9 HIM 507090 H H H H A A A A I A A A A Figure5 (a)MaximumlikelihoodtreeoftheSAR11cladeusing322concatenatedproteins.SubcladeIchighlightedinblue.Allnodes had100%bootstrapsupportunlessotherwiseindicated.Scalebarindicateschangesperposition.RootwasinferredfromThrashetal. (2011)andFigure1.(b)Averageamino-acididentityvs16SrRNAgeneidentity.Colorscorrespondtovaluesineachcellaccordingtothe key.Dashedlineindicatesgenus-levelboundariesaccordingtoKonstantinidisandTiedje(2007).Note,AAA240-E13hasonlyapartial 16SrRNAgenesequence,allothersarefull-length(seeSupplementaryInformation). e scaffold 14, bases 24207-26268 ac All a.a. substitutions urf 100 s e h n t 90 e i or M 80 70 60 ETSP OMZ surface ALOHA 500m ETSP OMZ 200m ALOHA 770m ETSP OMZ 500m PRT Gs 50 ALOHA 1000m ETSP OMZ 800m BATS 200-250m A S e h n t More i AlaCysAspGluPheGlyHisIleLysLeuMetAsnProGlnArgSerThrValTrpXTyrgap Figure 7 Recruitment of metagenomic sequences to the pre- Figure 6 Fold change in amino-acid substitutions between the dicted CRISPR region. Upper box represents a magnification of SAGs and the surface genomes. Pair-wise substitutions were thegenomicregiononscaffold14indicatedinthetitle.Eachline quantified based on BLAST alignments of homologs between isametagenomicsequencewithreciprocalbesthits(rbhs)tothis surfacegenomesandSAGs.X,unknowncodons. region,organizedby%identity(yaxis)andsample(color).Those samples not appearing in the analysis either had only rbhs o50bpornorbhs. withpreviousstudiesshowingenrichmentofphage genes at depth (Martin-Cuadrado et al., 2007; Konstantinidis et al., 2009). The SAGs had unique HMMs developed by (Haft et al., 2005; Makarova phage integrases and phage protein D genes et al., 2011) found some evidence for a cas4-like (Supplementary Table S1), and AAA240-E13 con- gene currently annotated as a hypothetical tained a predicted CRISPR region (Makarova et al., protein, conserved in three SAGs and HTCC9565 2011) on scaffold 14 (Figure 7). A search for (Supplementary Table S1, cluster 15001317). corresponding CRISPR-associated (cas) genes using In AAA240-E13, this cas4-like protein was on TheISMEJournal DeepwaterSAR11bathytypegenomics JCThrashetal 1447 scaffold 18 and thus not located directly nearby the 6 CRISPR.Widespread Pelagiphagethat infectatleast a subset of the known surface strains have been recently discovered (Zhao et al., 2013), but this is 4 the only putative CRISPR locus identified so far in SAR11 genomes. Detailed analysis showed that this region had recruitment of metagenomic sequences ge 2 n mostly from the mesopelagic samples of station a h ALOHA, indicating that the CRISPR is relatively d c 0 specific, geographically (Figure 7). The observed ol increase in subclade Ic COG M genes may also have g f2 o −2 a role in phage defense (Rodriguez-Valera et al., l 2009). −4 Gene-specific relative abundance in metagenomic −6 datasets We used metagenomic data to evaluate the relative 1e−02 1e−01 1e+00 1e+01 1e+02 importance of SAG genes in situ, postulating that genes with little or no recruitment could be mean of normalized counts discounted as being present in fewer organisms, Figure8 Plotofnormalizedmeanvslog-foldchangeforsurface whereasthosewithhighlevelsofrecruitmentcould vsdeepgeneclusters. be inferred as being the most conserved, and therefore most important, to Ic-type organisms. Broadly, patterns of differential gene abundance metagenomicdatasets(SupplementaryTableS1).A between the SAR11 subclades could be identified predicted adenosine deaminase, unique to the across data sets. In most of the deep water samples, SAGs, was highly abundant in deep samples. This SAGsformedstatisticallysignificantgroupingbased gene works upstream of xanthine dehydrogenase on hierarchical clustering of recruitment profiles, (also significantly more abundant) in purine degra- indicating that these genomes are highly similar dation, and although not statistically significant, based on relative abundance of reciprocal best blast other elements of the putative subclade Ic-specific hits in deep-water environments (Supplementary purine degradation pathway, including the 2-oxo-4- Figure S13). The relative abundances of every gene hydroxy-4-carboxy-5-ureidoimidazoline decarboxy- for each SAG are reported in Supplementary lase, had high recruitment in deep samples Table S1 for all normalized datasets. Thirty-nine compared with surface samples. Putative pillin clusters showed significantly higher relative abun- assembly (pilF) genes, shared with other SAR11s, danceofmetagenomicsequencerecruitmentindeep were also significantly more abundant in deep water (those at 200m and below) compared with water samples, as were several methyltransferases, surface datasets (Figure 8, Supplementary a Naþ/proline symporter and a high-affinity Information). Only two of these clusters did not Fe2þ/Pb2þ permease. contain SAG genes; whereas, of the 42 clusters that Sulfite oxidase genes, conserved in three SAGs were significantly more abundant in surface sam- and shared only with HTCC9565, showed more ples, only two contained SAG genes and the rest recruitment in deep water samples, and were were exclusive surface genomes. Half of the deep located directly adjacent to a cytochrome in the abundance clusters were exclusive to the SAGs, the same configuration as the sorAB genes with proven otherhalfhadsomeshareddistributionbetweenthe sulfite oxidase activity in Starkeya novella ATCC SAGs and surface genomes (Supplementary 8083T (Kappler et al., 2000, 2012). The predicted Table S1). AAA240-E13 sulfite oxidase had 33% identity with Of the 19 of these clusters that were specific to the S. novella SorA protein (blastp). Nearby were subclade Ic, 9 were annotated as hypothetical genes encoding for predicted Fe-S proteins, molyb- proteins. A subclade Ic-specific cluster of putative dopterin biosynthesis enzymes and molybdenum Fe-S oxidoreductases contained multiple copies cofactor synthesis (Mo and heme are required from each SAG, and all of the SAGs also had cofactors (Kappler et al., 2000; Aguey-Zinsou et al., multiple copies of uncharacterized genes that 2003)), which also appeared qualitatively more clustered with single copies of predicted membrane abundantindeepwatersamples.Thismaytherefore occupation and recognition nexus (MORN) repeat indicate a mechanism for sulfur chemolithotrophy genes from the subclade Ia genomes. The gene in subclade Ic and HTCC9565. Utilization of expansions for both these clusters suggested the partially reduced sulfur compounds could also proteins were important in the Ic subclade and in potentially explain the high abundance of SAR11 support of this hypothesis both were among organismsandSAR11-typeadenosinephosphosulfate the clusters significantly more abundant in deep reductase (aprAB) genes found in the ESTP OMZ, TheISMEJournal DeepwaterSAR11bathytypegenomics JCThrashetal 1448 particularly at 200m where dissolved oxygen is Are subclade Ic SAR11 truly piezophilic (growth lowest and sulfur cycling has been identified rates increasing with pressure from 1 to 500atm (Figure 2; Canfield et al., 2010; Stewart et al., (MadiganandMartinko,2006)),oraretheyprimarily 2012). The aprAB genes were found in all adapted to the shallower mesopelagic zone (piezo- subclade Ia and two of the subclade Ic genomes tolerant)? Although the ALOHA 4000m and PRT (Supplementary Table S1), and had high metagenomic analyses demonstrated subclade Ic abundances in most of the deep water samples and organisms can be found in abysso- and hadopelagic higher abundance in deep vs shallow samples in realms, the lack of additional data from extreme datasetsfromthesamewatercolumn.Giventhelack deep water sites leaves the abundance of Pelagibac- of additional genes in the assimilatory sulfate terales subclade Ic in such locations in question. reduction pathway in most SAR11 organisms (there Further,many previouslyidentifiedfeatures ofboth was a predicted sat gene in HTCC9565 (Grote et al., piezophilic isolates and deep ocean single-cell 2012)), aprAB have been proposed to have a role in genomes (Simonato et al., 2006; Lauro and Bartlett, taurine metabolism (Williams et al., 2012), and may 2008; Nagata et al., 2010; Swan et al., 2011) are serve as a key sulfur cycling process for SAR11 in absentintheSAR11SAGs.Althoughtheincomplete deep water as well. Our results indicate that the state of the SAGs leaves open the possibility that observed abundance of aprAB in the ESTP OMZ thesefeaturesmaybecontainedintheunsequenced may be due to subclade Ic, rather than subclade Ia portion of the genomes, their absence in the nearly organisms. complete of AAA240-E13 SAG implies that even if Metagenomic relative abundance measurements present in some SAR11 Ic organisms, they are not allowed us to evaluate the potential importance of universally conserved by the subclade. Alterna- other notable genes found in the SAGs. Two, tively, previously described features of deep ocean AAA288-G21 and AAA288-N07, contained pre- isolates may not be common to all piezophiles, and dictedcopiesofproteorhodopsin,unexpectedgiven some piezophilic adaptations may not be directly the predominance of subclade Ic below the photic observable at the level of nucleic acid or protein zone. The phylogeny of the proteorhodopsin genes sequence variation. For example, many, but not all, generally matched the topology of the species tree piezophilescontainpolyunsaturatedacids,andcold (Supplementary Figure S14) and these loci showed or high pressure adaption can also be achieved by modestrecruitmentinmanyofthesamplesforboth changing the ratio of unsaturated to saturated strains (Supplementary Table S1), indicating that monounsaturated fatty acids in membrane lipids thesubcladeIcmaycycletotheeuphoticzonewith (DeLong and Yayanos, 1985). Such properties are enough frequency, as a population, for the physio- not readily predictable from genomes. Finally, logical benefits of retaining proteorhodopsin to be as these SAGs were isolated from 770m, a depth realized. Many of the unique or unexpected SAG that does not usually represent a piezophilic genes with annotations were located in hypervari- environment, the possibility exists that the Ic able regions (genomic islands), where there was subclade may have further bathytype divisions, little or no recruitment of metagenomic sequences including true piezophiles that occupy the deeper (Coleman, 2006; Wilhelm et al., 2007; Tully et al., realms. 2011; Grote et al., 2012; Supplementary Table S1). The evidence herein suggests these are a piezo- AAA240-E13 and AAA288-E13 had copies of tolerantsubclade,withmetabolismsimilartothatof predicted flagellar proteins, including a motor surface subclades focused on aerobic oxidation of switchproteinandabasal-bodyP-ringproteinlocated organic acids, amino acids, and C1 and methylated together, and AAA240-E13 additionally had a puta- compounds, universal products of metabolism that tiveflagellarbiosynthesis/typeIIIsecretorypathway are expected to be found in all biomes, and may protein. However, the first two genes showed no contain mechanisms for nitrogen salvage and sulfur recruitment in any of the metagenomic data sets, chemolithotrophy unusual in most surface SAR11 and the third had recruitment in only one, indicat- genomes.Theyalsoappeartohavebeenevolvingas ing that they were unlikely to be a common trait an environmentally isolated subclade for long among subclade Ic strains (Supplementary enough to show distinct signatures at the genome Table S1). AAA240-E13 had the first mismatch level. Thus, we can affirm our hypothesis: the repair (mutS) family homolog found in a SAR11 subclade Ic SAGs did contain genomic features that genome(Viklundetal.,2012),butittoowaslocated distinguished them from the surface SAR11 gen- in a hypervariable region. omes, although these features were generally more subtlethanlarge-scalegenecontentvariations.They had larger intergenic regions and larger coding Summary regions in SAR11 clade orthologs, had a slightly The results of our metagenomic analyses from a larger estimated average genome size, were distinct variety of locations strongly support the conclusion phylogenetically and at the amino-acid content thatthesubcladeIcorganismsareautochthonousto level, were enriched and depleted in COG M and P the deep ocean. However, this raises the question, genescomparedwithotherSAR11genomes,respec- whatarethedepthstowhichtheyarebestadapted? tively, and contained clade-specific hypothetical TheISMEJournal DeepwaterSAR11bathytypegenomics JCThrashetal 1449 genes with increased relative abundances in oligotroph ’Candidatus Pelagibacter ubique’ deep water samples. Further examination of such HTCC1062 ona definedmedium. ISME J7:592–602. hypothetical genes and cultivation successes CarlsonCA,MorrisR,ParsonsR,TreuschAH,GiovannoniSJ, with deep ocean SAR11 strains will help provide Vergin K. (2009). Seasonal dynamics of SAR11 populations in the euphotic and mesopelagic zones a mechanistic explanation for how the features of thenorthwestern Sargasso Sea.ISME J 3:283–295. described by this study contribute to the predomi- Castresana J. (2000). Selection of conserved blocks from nance of subclade Ic organisms in deeper water. multiple alignments for their use in phylogenetic analysis. MolBiol Evol 17:540–552. Chou HH, Holmes MH. (2001). DNA sequence quality Conflict of Interest trimming and vector removal. Bioinformatics 17: 1093–1104. The authors declare no conflict of interest. ColemanML.(2006).Genomicislandsandtheecologyand evolutionofProchlorococcus.Science311:1768–1770. DeLong EF, Yayanos AA. (1985). Adaptation of the Acknowledgements membrane lipids of a deep-sea bacterium to changes in hydrostaticpressure. Science 228:1101–1103. ThisworkwassupportedbytheGordonandBettyMoore DeLong EF, Preston CM, Mincer T, Rich V, Hallam SJ, Foundation (SJG and EFD), the US Department of Energy Frigaard NU et al. (2006). Community genomics Joint Genome Institute (JGI) Community Supported Pro- amongstratifiedmicrobial assemblagesin the ocean’s gram grant 2011-387 (RS, BKS, EFD, SJG), National interior.Science 311:496. ScienceFoundation(NSF)ScienceandTechnologyCenter EddySR.(2011).AcceleratedProfileHMMSearches.PLoS Award EF0424599 (EFD), NSF awards EF-826924 (RS), ComputBiol 7:e1002195. OCE-821374(RS)andOCE-1232982(RSandBKS),andis EdgarRC.(2004).MUSCLE:multiplesequencealignment based on work supported by the NSF under Award no. with high accuracy and high throughput. Nucleic DBI-1003269(JCT).SequencingwasconductedbyJGIand AcidsRes 32:1792–1797. supported by the Office of Science of the US Department EloeEA,FadroshDW,NovotnyM,ZeiglerAllenL,KimM, of Energy under Contract No. DE-AC02-05CH11231. We Lombardo M-J et al. (2011a). Going deeper: metagen- thank Christopher M Sullivan and the Oregon State omeofahadopelagicmicrobialcommunity.PLoSOne University Center for Genome Research and Biocomput- 6:e20388. ing, as well as the Louisiana State University Center for EloeEA,ShulseCN,FadroshDW,WilliamsonSJ,AllenEE, Computation and Technology for vital computational Bartlett DH. (2011b). Compositional differences in resources. We also thank Kelly C Wrighton and Laura A particle-associated and free-living microbial assem- Hug for critical discussions about single-cell genomics, blages from an extreme deep-ocean environment. metagenomics andmetabolicreconstruction. EnvironMicrobiolRep3:449–458. FieldK,GordonD,WrightT,RappeM,UrbachE,VerginK etal.(1997).Diversityanddepth-specificdistribution ofSAR11clusterrRNAgenesfrommarineplanktonic bacteria.Appl EnvironMicrobiol 63:63–70. References Fu L, Niu B, Zhu Z, Wu S, Li W. (2012). CD-HIT: accelerated for clustering the next-generation sequen- AbascalF,ZardoyaR,PosadaD.(2005).ProtTest:selection cing data.Bioinformatics 28:3150–3152. ofbest-fitmodelsofproteinevolution.Bioinformatics Giovannoni SJ, Rappe MS, Vergin KL, Adair NL. (1996). 21:2104–2105. 16S rRNA genes reveal stratified open ocean bacter- Aguey-ZinsouK-F,BernhardtPV,KapplerU,McEwanAG. ioplanktonpopulationsrelatedtothegreennon-sulfur (2003). Direct electrochemistry of a bacterial sulfite bacteria.ProcNatl AcadSci USA93:7979–7984. dehydrogenase. JAmChemSoc 125:530–535. Giovannoni SJ, Vergin KL. (2012). Seasonality in ocean AndersS,HuberW.(2010).Differentialexpressionanalysis microbialcommunities. Science 335:671–676. forsequencecountdata.GenomeBiol11:R106. GordonDA,GiovannoniSJ.(1996).Detectionofstratified Arıstegui J, Gasol JM, Duarte CM, Herndl GJ. (2009). microbial populations related to Chlorobium and Microbial oceanography of the dark ocean’s pelagic FibrobacterspeciesintheAtlanticandPacificoceans. realm.LimnolOceanogr 54:1501–1529. Appl EnvironMicrobiol 62: 1171–1177. Blainey PC. (2013). The future is now: single-cell Grote J, Thrash JC, Huggett MJ, Landry ZC, Carini P, genomics of bacteria and archaea. FEMS Microbiol Giovannoni SJ et al. (2012). Streamlining and core Rev 37:407–427. genome conservation among highly divergent mem- Brown MV, Lauro FM, DeMaere MZ, Les M, Wilkins D, bers of theSAR11 Clade.mBio 3:e00252–00212. ThomasTetal.(2012).GlobalbiogeographyofSAR11 Haft DH, Selengut J, Mongodin EF, Nelson KE. (2005). marine bacteria.MolSys Biol 8:1–13. Aguildof45CRISPR-associated(Cas)proteinfamilies CampanaroS,TreuL,ValleG.(2008).Proteinevolutionin and multiple CRISPR/Cas subtypes exist in prokar- deepseabacteria:ananalysisofaminoacidssubstitu- yotic genomes.PLOS Comput Biol 1:e60. tion rates. BMCEvol Biol 8:313. Hay NA, Tipper DJ, Gygi D, Hughes C. (1999). A novel Canfield DE, Stewart FJ, Thamdrup B, De Brabandere L, membrane protein influencing cell shape and multi- DalsgaardT,DeLongEFetal.(2010).Acrypticsulfur cellularswarmingofProteusmirabilis.JBacteriol181: cycleinoxygen-minimum-zonewatersofftheChilean 2008–2016. coast.Science 330:1375–1378. Ivars-Mart´ınezE,Martin-CuadradoA-B,D’AuriaG,MiraA, Carini P, Steindler L, Beszteri S, Giovannoni SJ. (2012). Ferriera S, Johnson J et al. (2008). Comparative Nutrient requirements for growth of the extreme genomics of two ecotypes of the marine planktonic TheISMEJournal
Description: