TheISMEJournal(2016)10,2702–2714 OPEN ©2016InternationalSocietyforMicrobialEcology Allrightsreserved 1751-7362/16 www.nature.com/ismej ORIGINAL ARTICLE RubisCO of a nucleoside pathway known from Archaea is found in diverse uncultivated phyla in bacteria Kelly C Wrighton1,7, Cindy J Castelle2,7, Vanessa A Varaljay1, Sriram Satagopan1, Christopher T Brown3, Michael J Wilkins1,4, Brian C Thomas2, Itai Sharon2, Kenneth H Williams5, F Robert Tabita1 and Jillian F Banfield2,6 1Department of Microbiology, The Ohio State University, Columbus, OH, USA; 2Department of Earth and PlanetaryScience,UCBerkeley,Berkeley,CA,USA;3DepartmentofPlantandMicrobialBiology,UCBerkeley, Berkeley, CA, USA; 4School of Earth Sciences, The Ohio State University, Columbus, OH, USA; 5Earth Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA and 6Department of Environmental Science, Policy, and Management, University of California, Berkeley, CA, USA MetagenomicstudiesrecentlyuncoveredformII/IIIRubisCOgenes,originallythoughttoonlyoccurin archaea,fromuncultivatedbacteriaofthecandidatephylaradiation(CPR).TherearenoisolatedCPR bacteriaandtheseorganismsarepredictedtohavelimitedmetaboliccapacities.Hereweexpandthe knowndiversityofRubisCOfromCPRlineages.WereportaformofRubisCO,distantlysimilartothe archaeal form III RubisCO, in some CPR bacteria from the Parcubacteria (OD1), WS6 and Microgenomates (OP11) phyla. In addition, we significantly expand the Peregrinibacteria (PER) II/III RubisCO diversity and report the first II/III RubisCO sequences from the Microgenomates and WS6 phyla.Toprovideametaboliccontextfor theseRubisCOs,wereconstructednear-complete(493%) PER genomes and the first closed genome for a WS6 bacterium, for which we propose the phylum nameDojkabacteria.GenomicandbioinformaticanalysessuggestthattheCPRRubisCOsfunctionin a nucleoside pathway similar to that proposed in Archaea. Detection of form II/III RubisCO and nucleosidemetabolism gene transcripts fromaPER supportsthe operation of this pathwayin situ. WedemonstratethatthePERformII/IIIRubisCOiscatalyticallyactive,fixingCO tophysiologically 2 complement phototrophic growth in a bacterial photoautotrophic RubisCO deletion strain. We proposethattheidentificationoftheseRubisCOsacrossaradiationofobligatelyfermentative,small- celled organisms hints at a widespread, simple metabolic platform in which ribose may be a prominentcurrency. TheISME Journal (2016) 10,2702–2714; doi:10.1038/ismej.2016.53; publishedonline3May2016 Introduction (CPR),amonophyleticgroupofatleast35phylathat accounts for 415% of all bacterial diversity (Brown Thevastmajorityoftheorganismsintheenvironment et al., 2015). Metagenomic sampling of almost 800 have not been cultivated, obscuring our knowledge of CPR bacteria suggested that members of this radia- their physiology. Recent metagenomic investigations tion consistently have small genomes with many have revealed the presence of a large diversity of metabolic limitations, including lack of an electron uncultivated organisms in marine and terrestrial transport chain, no more than a partial tricarboxylic subsurface environments (Castelle et al., 2013, 2015; acid cycle and mostly incomplete nucleotide and Lloydetal.,2013;Bakeretal.,2015).Forexample,ina amino-acid biosynthesis pathways. Using metabolic metagenomic study of a shallow alluvial aquifer, we predictions from complete and near-complete gen- determined that many of the uncultivated bacteria omes (Wrighton et al., 2012), we inferred that were associated with the candidate phyla radiation members of the CPR are fermenters whose primary biogeochemical impact is primarily on subsurface organic carbon and hydrogen cycling (Wrighton Correspondence: JF Banfield, Department of Earth and Planetary Sciences,andDepartmentofEnvironmentalScience,Policy,and et al., 2014). However, many of the genes in these Man, University of California at Berkeley, 336 Hilgard Hall, genomesremainpoorlyannotatedandthemetabolic Berkeley,CA94720,USA. platformofCPRbacteriaremainsuncertain.Further, E-mail:[email protected] no transcription and enzyme activities have been 7Theseauthorscontributedequallytothiswork. validated, hindering knowledge of the actual reac- Received 25 August 2015; revised 29 February 2016; accepted 4March2016;publishedonline3May2016 tions catalyzed by these organisms. RubicCOinuncultivatedbacteriaoftheCPR KCWrightonetal 2703 The incorporation of CO into organic carbon is a experiments were performed by adding approxi- 2 critical step in the carbon cycle. Ribulose-1,5- mately 15mM acetate to the groundwater over the bisphophate carboxylase-oxygenase (RubisCO) is course of 72 days to maintain an average in situ the most abundant enzyme on earth and is integral concentrationof~3mMacetateduringpeakstimula- to the fixation of carbon dioxide. Currently four tion. Six microbial community samples (A–F) were forms of RubisCO can fix atmospheric carbon collected over 93 days, including 3 days prior to dioxide (I, II, II/III and III) (Tabita et al., 2007; acetate stimulation and after acetate addition ceased 2008). Forms I and II are used by plants, algae and (amendment ceased on day 72). Ferrous iron and some chemoautotrophic and phototrophic bacteria sulfide concentrations were analyzed immediately for CO fixation during primary production via the after sampling using the HACH phenanthroline assay 2 Calvin Benson Bassham (CBB) cycle. Forms III and and methylene blue sulfide reagent kit, respectively II/III, found primarily in archaea, enable light- (HACH, Loveland, CO, USA). Acetate and sulfate independent CO incorporation into sugars derived concentrations were determined using a Dionex 2 from nucleotides like adenosine monophosphate ICS-2100 ion chromatograph (Sunnyvale, CA, USA) (AMP) (Sato et al., 2007; Tabita et al., 2007). Form equipped with an AS-18 guard and analytical column II/III RubisCOs, which were primarily known in (Williams et al., 2011). Microbial cells from pumped methanogens, have greater overall similarity to groundwater that passed through a 1.2-μm prefilter bacterialFormIIRubisCObuthavespecificresidues, (Pall,PortWashington,NY,USA)wereretainedon0.2- structural and catalytic features that more closely and 0.1-μm filters (Pall). Filters were flash-frozen in resemble archaeal form III (Alonso et al., 2009). liquid nitrogen immediately upon collection. Previously, we reported the first evidence of form II/III RubisCO genes in partial genomes from the Peregrinibacteria(PER)phylumintheCPRradiation Nucleic acid extraction and sequencing (Wrighton et al., 2012). Additionally, three bacterial Approximatelya1.5-gsamplefromeachfrozenfilter formII/IIIRubisCOswererecoveredfromgenomesof was used for genomic DNA extraction. DNA was membersoftheSR1phylumthatalsoareinferredto extracted using the PowerSoil DNA Isolation Kit be fermentative (Wrighton et al., 2012; Campbell (MoBio Laboratories, Inc., Carlsbad, CA, USA). et al., 2013; Kantor et al., 2013). The presence of Illumina HiSeq 2000 2×150 paired-end sequencing these annotated RubisCO genes in small genomes was conducted by the Joint Genome Institute on thathousefewancillarygenessuggeststhatRubisCO extractedDNA.SequencingamountsforsamplesA– may have an important role in the metabolism of F from 0.1- and 0.2-μm filters ranged from 9.5 to some of these bacteria. 40.7Gbp, as described in Castelle et al. (2015). RNA To more comprehensively explore the diversity was extracted using Invitrogen TRIzol Reagent and potential role of CPR RubisCOs, we reproduced (Carlsbad, CA, USA) followed by genomic DNA the prior experiment from which we first identified removal and cleaning using Qiagen RNase-Free bacterial form II/III RubisCO sequences (Wrighton DNase Set Kit (Valencia, CA,USA) andQiagen Mini et al., 2012). Size filtration was used to increase the RNeasy Kit (Valencia, CA, USA). An Agilent 2100 relative abundance of the small-celled CPR organ- Bioanalyzer (Santa Clara, CA, USA) was used to isms in samples used for sequencing (Luef et al., assess the integrity of the RNA samples. Only RNA 2015).HereweinvestigateCPRgenomesthatencode samples having RNA Integrity Number between 8 RubisCOgenes,reportanewRubisCOsequencetype and 10 were used. The Applied Biosystems SOLiD diverged from Archaeal form III and propose a Total RNA-Seq Kit (Invitrogen/LifeTechnologies, broader metabolic context in which these genes Carlsbad, CA, USA) generated the cDNA template occur. By detecting in situ transcripts and through library. The SOLiD EZ Bead system was used to directly cloning one CPR form II/III RubisCO perform emulsion clonal bead amplification to sequence into an expression host, we suggest the generate bead templates for SOLiD platform sequen- potentialfunctionalityofthisbacterialenzyme.This cing.Samplesweresequencedonthe5500XLSOLiD study represents the first enzymatic activity demon- platform (LifeTechnologies). The 75-bp sequences strated from these broadly distributed but metaboli- produced were mapped in color space using the cally enigmatic candidate phyla bacteria. SOLiD LifeScope software version 2.5 (Life Techno logies/Invitrogen) using the default parameters against the reference genome set of 2302715 Materials and methods separate gene FASTA entries. Corresponding GTF fileswerebuilttorecordthepositionofeachgeneon Groundwater field experiment and sample collection the set of artificial chromosomes. The field experiment was carried out between 25 Augustand12December2011attheUSDepartment of Energy Rifle Integrated Field Research Challenge Genome assembly, binning and functional annotations site adjacent to the Colorado River, CO, USA as Methods for assembly, binning and annotation recentlydescribed(Brownetal.,2015;Castelleetal., are described from methods described in detail 2015; Luef et al., 2015). In brief, biostimulation in prior publications (Wrighton et al., 2012; TheISMEJournal RubicCOinuncultivatedbacteriaoftheCPR KCWrightonetal 2704 Castelle et al., 2015). Reads were quality trimmed We then used these RubisCO sequences to recruit usingSickle(https://github.com/najoshi/sickle)with additionalCPRsequencesfromcurrentlyunreported default settings. Trimmed paired-end reads were groundwater and sediment genomic data sets from assembled using IDBA_UD (Peng et al., 2012) with the same field site. default parameters. Genes on scaffolds 45kb in length were predicted using Prodigal (Hyatt et al., 2012)withthemetagenomeprocedure(-pmeta),and 16S rRNA gene recovery and phylogenetic analyses then USEARCH (–ublast) (Edgar, 2010) was used to 16S rRNA gene sequences were identified in search protein sequences against UniRef90 (Suzek assembledscaffoldsbasedonHiddenMarkovModel et al., 2007), KEGG (Kanehisa and Goto, 2000; searches using the cmsearch program from the Kanehisa et al., 2012) and an in-house database Infernal package (cmsearch –hmmonly –acc –noali composed of open reading frames from candidate –T –1) (Nawrocki et al., 2009). This protocol is phyla genomes. The in-house database includes described in detail in Brown et al. (2015). Searches previously published genomes (Wrighton et al., were conducted using the manually curated struc- 2012; Castelle et al., 2013; Hug et al., 2013; Kantor turalalignmentofthe16SrRNAprovidedwithSSU- etal.,2013;Wrightonetal.,2014)andgenomesfrom Align (Nawrocki, 2009). Sequences corresponding ongoing work and primarily aiding in identifying with CPR genomes were aligned using SSU-Align putative CPR scaffolds. For each scaffold, we along with the best hits of these sequences in the determined the GC content, coverage and profile of Silvadatabase(v1.2.11)(Quastetal.,2013),acurated phylogeneticaffiliationbasedonthebesthitforeach setofreferencesequencesassociated with candidate gene against the Uniref90 (Suzek et al., 2007) phyla of interest, and a set of archaea sequences to database. Scaffolds were binned to specific organ- serve as a phylogenetic root. Sequences with isms by using coverage across the samples, phyloge- ⩾800bpalignedinthe1582-bpalignmentwereused netic identity and GC content, both automatically to infer a maximum-likelihood phylogeny using with the ABAWACA algorithm and manually using RAxML (Stamatakis, 2014) with the GTRCAT model ggKbase (http://ggkbase.berkeley.edu/). ABAWACA of evolution selection via ProTest and 100 bootstrap is an algorithm that generates preliminary genome re-samplings. bins based on different characteristics of assembled scaffolds (https://github.com/CK7/abawaca). Six genome bins generated by ABAWACA were manu- Protein phylogenetic analyses and modeling ally inspected within ggKbase. Binning purity was For single protein trees (for example, RubisCO large confirmed using an Emergent Self-Organizing Map subunit), the phylogenetic pipeline was performed (Dicketal.,2009).Ourprimarymethodforassessing as previously described (Wrighton et al., 2012). genome completeness was based on the presence or Briefly, protein sequences were aligned using MUS- absence of orthologous genes representing a core CLE version 3.8.31 (Edgar, 2004) with default gene set that are widely conserved as single-copy settings. Problematic regions of the alignment were genes among bacterial candidate phyla, but we also removed using very liberal curation standards with assessed global markers. All single copy and meta- the program GBlocks (Talavera and Castresana, bolic analyses reported here and summarized in the 2007). Alignments with and without GBlocks were Supplementary Figures are reported in ggKbase confirmed manually. Best models of amino-acid (FASTA files can be accessed via links provided in substitution for each protein alignment were esti- the Supplementary Figures). Manual assembly cura- mated using ProtTest3 (Abascal et al., 2005). Phylo- tion improved some genome bins by extending and genetic trees were generated using RAxML with the joining scaffolds, removing local scaffolding errors PROTCAT setting for the rate model and the best and, in some cases, bringing in small previously model of amino-acid substitution specified by unbinned fragments. One genome was curated to ProtTest. Nodal support was estimated based on closure and completion. 100 bootstrap replications using the rapid boot- strapping option implemented in RAxML. Three- dimensionalstructurepredictionsweregeneratedby Inventory of the RubisCO genes SWISS-MODEL (an automated protein homology- All putative RubisCO genes on genome fragments modeling server) and i-tasser based on protein assembled from the 12 separate groundwater filter alignment and secondary structure prediction. The samples were recovered first by BLAST search alignment mode was based on a user-defined target- (Altschul et al., 1990) against an in-house database template alignment. Conservation of key catalytic that included candidate phyla sequences from prior residues and the secondary structure foreach model studies (Wrightonetal.,2012;Campbelletal.,2013; was confirmed by manual inspection. Kantor et al., 2013; Castelle et al., 2015). The taxonomic affiliation of these putative RubisCO CPR scaffolds was confirmed based on phylogenetic Plasmids and cloning analysis of genes included on the same genome Rifle metagenomic DNA wasusedasthe templateto fragmentastheRubisCOorinthesame genomebin. amplify the exact coding regions for the RubisCO TheISMEJournal RubicCOinuncultivatedbacteriaoftheCPR KCWrightonetal 2705 genes from PER genome PER_GWF2_38_29 (PER-2), biparental matings (Smith and Tabita, 2003; the OD-1 genome GWE1_OD1_1_1_39 and WS6 Satagopan et al., 2009, 2014). Plasmid-complemented genomesGWC1_1_418andGWC1_33_22_1_93.Primer strain SB I/II− colonies were first selected on peptone sequencesusedforRubisCOamplificationareincluded yeast extract plates followed by streaking onto Ormer- (Supplementary Table S1). PCR was performed using od’s minimal medium (Ormerod et al., 1961) agar the PrimeStar GXL (Clontech, Mountain view, CA, plates with no added organic carbon. To test for USA) high-fidelity polymerase. PCR products for all photoautotrophic growth complementation, plates geneswereclonedbetweenNdeIandBamHIsitesin were incubated in illuminated anaerobic jars, which plasmid pet11a (Novagen/EMD Millipore, Billerica, were periodically flushed with a gas mixture of 5% MA, USA), which does not incorporate an N-term- CO /H aspreviouslydescribed(Satagopanetal.,2009, 2 2 inal hexa-histadine tag. For the PER, the PCR 2014). Anaerobic CO -dependent photoautotrophic 2 product was also cloned between KpnI and BamHI growth was obtained using Ormerod’s minimal med- sites in plasmid p80, a high copy plasmid 3716- iumagarplatesinasealedjarflushedwitha5%CO / 2 based vector (a derivative of pLO11 obtained from 95%H gasmixture.Coloniesonminimalplatesunder 2 Oliver Lenz and Bärbel Friedrich, Berlin, Germany), these conditions were used to inoculate liquid photo- which contained the Rhodospirillum rubrum CbbR/ autotrophic cultures bubbled continuously with 5% cbbM promoter region, for expression in vivo. This CO /H for growth analysis; cells were harvested 2 2 promoterhasbeenusedpreviouslyinthebroad-host anaerobically for RubisCO activity assays, as range vector pRPS-MCS3 to drive the expression of described above. RubisCO genes to complement RubisCO deletion strain Rhodobacter capsulatus strain SB I/II (Smith and Tabita, 2003). Constructs for the coding regions In vitro RubisCO assays of archaeal form III Archaeoglobus fulgidus RbcL2 RubisCO assays with activated enzyme preparations (Kreel and Tabita, 2007), archaeal form II/III Metha- were performed according to Satagopan et al. (2014) nococcoides burtonii rbcL and bacterial form II R. with modifications to ensureanaerobiosis.All crude rubrum cbbM (Falcone and Tabita, 1993) were used proteinlysates, reagentsandsupplieswereprepared for comparison and served as positive controls. The inananaerobicchamberandcrimpedinsealedglass sequencesofallconstructswereverifiedatthePlant- vials prior to conducting the assays in a hood using Microbe Genomics Facility at The Ohio State gas-tight Hamilton syringes (Hamilton Robotics, University. Reno, NV, USA). Assay reaction mixtures contained 50mM NaHCO3 (~2μCi of NaH 14CO3), 10mM MgCl2 and 0.8mM ribulose 1,5 bisphosphate (RuBP) in Growth and culture conditions 50mMbicine-NaOHbuffer,pH8.0.RubisCOactivity To test for in vitro activity, RubisCO proteins from isinitiatedbyaddition ofitsphysiologicalsubstrate, pet11a constructs were expressed in Escherichia RuBP. Reactions were initiated with the addition of coli strain Rosetta 2 (DE3) pLysS (Novagen), which crude protein lysate to its physiological substrate, usesthepRAREplasmidtoaccommodateexpression RuBP,at30°Candincubatedfor5–30min.Negative of rare codons such as those from candidate phyla control reactions consisted of no RuBP or no crude and archaeal RubisCO sequences. Cells were grown protein lysate added. Carboxylase activities were in 50ml lysogeny broth supplemented with determined following the equimolar conversion of 150μgml−1 ampicillin and 12.5μgml−1 chloram- 14C from NaHCO into acid-stable 3-phosphoglycerate 3 phenicol in 125-ml flasks. After reaching an optical (3-PGA), as measured via radioactive scintillation density of 0.6–0.8 (at 600nm), RubisCO gene counting. Protein was determined via the Bradford expression was induced by adding IPTG to a final method and specific activities (nmol CO fixed per concentration of 1mM. After 5–16h of shaking at minper mg protein) were calculated cons2istent with roomtemperature,cells were harvested, flash frozen prior publications (Satagopan et al., 2009). andstoredat−80°Cpriortoanalysis.Toensurethat candidatephyla RubisCOswere exposed tominimal amounts of oxygen, cell lysis and assay conditions Results and discussion were performed under strict anaerobic conditions. Using an anaerobic chamber, cell pellets were Recovery of CPR genomes that contain RubisCO resuspended in 50mM bicine-NaOH, 10mM MgCl2, We reconstructed CPR genomes from sequence data at pH 8.0, supplemented with 2mM NaHCO3 and setsforsamplescollectedover93days(Brownetal., 1mM dithiothreitol; crude protein lysates were 2015; Castelle et al., 2015). As previously reported, obtained via bead beating with 0.1-mm beads for bacteria from CPR phyla WS6, OD1 and OP11 were 3–4-min intervals in tightly capped tubes. enriched (475% 16S rRNA relative abundance) on To test for in vitro activity from complemented the 0.1-μm filter, and some members were shown to autotrophic cultures, p80::PER, p80::R. rubrum and have ultra-small cells (median cell volume of pRPS-MCS3::A. fulgidus RubisCO constructs were 0.009±0.002μm3) (Luef et al., 2015). Multiple transformed into E. coli strain S17-1 (ATCC47055) genomes belonging to the CPR phylum PER were and mobilized into R. capsulatus SB I/II− via recovered from all six 0.2-μm filter data sets, TheISMEJournal RubicCOinuncultivatedbacteriaoftheCPR KCWrightonetal 2706 suggesting that these cells may be larger in size than the three partial PER genomes analyzed from other CPR bacteria (for example, OD1, WS6) samples collected from the same aquifer 5–7 days enriched on the 0.1-μm filters. after acetate stimulation, two of which (ACD51, Taxonomic identifications of the CPR organisms ACD65) encoded RubisCO genes (Wrighton et al., from which we recovered RubisCO genes are 2012). Phylogenetic analyses that include these and provided in Supplementary Figures S1 and S2. The otherrecentlyreportedsequences(Rinkeetal.,2014; emergent self-organizing map clustered genome Brown et al., 2015) support the identification of this fragments based on their tetranucleotide sequence lineage as a separate phylum (Candidatus Peregrini- composition (Supplementary Figure S3) and vali- bacteria). Based on sequence divergence, the five dated 16 unique CPR genomes that ranged in Peregrinibacterial genomes reported here represent estimated completion from near-complete (494%) multiple taxonomic classes in this newly designated to complete (Supplementary Figure S4). Here we phylum. report RubisCO sequences from five genomic bins assigned to the PER phylum, two to the Parcubac- teria (OD1), five to the Microgenomates (OP11) and CPR genomes contain novel and diverse RubisCO six to the WS6 phylum (Table 1). We also report sequences RubisCO from additional scaffolds assigned to these Fifty-one new RubisCO genes were identified from CPR phyla. these CPR genomes. Some sequences, for example, The five WS6 genomes reported here span multi- from the WS6-1 and OD1-1 genomes, were indepen- pletaxonomicclasses(SupplementaryFigureS1and dently reconstructed from multiple data sets. Ulti- Supplementary Table S2) (Brown et al., 2015). The mately, 18 representative unique sequences were manually curated and closed WS6 genome is manually confirmed by read mapping to verify their 0.896Mbpinlengthandisthefirstcompletegenome accuracy (Figure 1). These sequences were most reconstructed for a member of this lineage. The similar to previously reported CPR or archaeal mapped read file confirming the assembly and the RubisCO sequences (and not bacterial RubiscCO annotations can be accessed via ggKbase (see sequences)(SupplementaryTableS3)andthemajor- Methods section). Given the level of genome com- ity harbor the key active-site residues (Figure 2). pletion and the breadth of phylogenetic diversity Phylogenetic analyses revealed that the seven sampled here (Konstantinidis and Rosselló-Móra, bacterial CPR RubisCO sequences reported here are 2015), we propose the name Candidatus Dojkabac- members of the previously defined archaeal inter- teria for the WS6 phylum to honor the memory of mediateformII/IIIRubisCOs(Figure1)(Tabitaetal., Michael A Dojka, who first reported this lineage 2007, 2008). Recent metagenomic studies reported basedon16SrRNAgenesequencedata(Dojkaetal., the form II/III RubisCOs from PER and SR1 unculti- 2000) vated bacterial lineages (Wrighton et al., 2012; We recovered five distinct PER genomes that we Campbell et al., 2013; Kantor et al., 2013). Although estimated to be 493% complete (Table 1; the overall sequence similarity is most closely Supplementary Figure S4). This data set augments related to type II, protein sequence alignment Table1 SummaryofCPRgenomesreportedinthisstudy GenomeID IDggKbase Taxonomic Completeness(%) Number RelativeGC Bin No.of RubisCo affiliation of content(%) length protein- form scaffolds (Mbp) codinggenes PER-1 PER_GWA2_38_35 Peregrinibacteria 95 5 38.2 1.11 1020 II/III PER-2* PER_GWF2_38_29 Peregrinibacteria 93 23 38 1.23 1164 II/III PER-3 PER_GWF2_39_17 Peregrinibacteria 100 15 38.8 1.31 1129 II/III PER-4 PER_GWC2_39_14 Peregrinibacteria 98 21 38.5 1.32 1247 II/III PER-5 PER_GWF2_43_17 Peregrinibacteria 100 16 43.1 1.20 1136 II/III WS6-1 WS6_GWF2_39_15 Dojkabacteria 100(closed) 1 38.7 0.896 891 II/III WS6-2 WS6_GWC1_33_20 Dojkabacteria 98 23 32.7 0.65 666 III-like WS6-3 GWF1_WS6_37_7 Dojkabacteria 100 18 36.7 1.22 1153 III-like WS6-4 WS6_GWE2_33_157 Dojkabacteria 100 29 32.8 0.61 628 III-like WS6-5 WS6_GWE1_33_547 Dojkabacteria 95 34 32.5 0.61 649 III-like WS6-6 WS6_GWF1_33_233 Dojkabacteria 98 23 32.7 0.61 640 III-like OD1-1 OD1_GWE2_42_8 Parcubacteria 93 34 42.5 0.743 781 III-like OD1-2 OD1_RIFOXA1_OD1_43_6 Parcubacteria 84 183 31.7 0.67 850 III-like OP11-1 OP11_GWA2_42_18 Microgenomates 95 53 42.3 1.24 1324 III-like OP11-2 GWA2_OP11_43_14 Microgenomates 100 24 43.2 1.65 1699 III-like OP11-3 RBG_16_OP11_37_8 Microgenomates 95 84 37.5 1.31 1506 III-like Abbreviation:CPR,candidatephylaradiation.Boldedtextincludesmanuallycuratedandconfirmednear-completeorcompletegenomes,while asterisk(*)denotesRubisCOthatwasclonedandfunctionallyconfirmed. TheISMEJournal RubicCOinuncultivatedbacteriaoftheCPR KCWrightonetal 2707 Intermediate type II/III Insertion of 48 amino acids PER2 WS6-1* 1RSOP11SR1 PEWR3SP6E-R1P1*EP*RE5R2/4 Bacterial type II Type IA/IB 00..11 BBaacctteerriiaall CCaannddiiddaattee PPhhyyllaa SSRR11 OOPP1111 WWSS66 100 OODD11 Type IC/ID 90 PPEERR 100 100 To type IV 94 RuBisCO-like 100 64 91 100 99 60 100 Firmicutes 100 100 OD1 SCG MethanoMseatrhciannaolceAosccicdaulliepsrofundum AHrcahloabeaacl tTeyripaele sIII WS6-3 Archaeal Type III 24In asmeritnioon a ocfid s OODP11-11-1/2/3OD1-1 WS6-2 WS6-2/4/5/6 Bacterial Type III-like Figure1 MaximumlikelihoodphylogenetictreeconstructedfortheRubisCOlargesubunit.Keybootstrapvalues450areshownbased on100resamplings.ProteinmodelsofthePERandDojkabacteria(WS6)typeII/IIIRubisCOsandtheDojkabacteriaandParcubacterium (OD1)typeIII-likeRubisCOsareincluded.Theasterisk(*)referstogenomesthatarecompleteornearlycompleteandhand-curated.The additional inserts of the form II/III RubisCO from the closed Dojkabacteria genome (48 residues) or of the form III-like from the Parcubacteriumgenome(24residues)arehighlightedinredontheproteinmodels(ExpandedviewshowninSupplementaryFigureS6). confirmed the presence of form III-specific residues genomes to date lack the insertion. The Microgeno- (Supplementary Figure S5), validating the form II/III mates and two of the Dojkabacteria RubisCOs phylogeneticplacementofthesebacterialsequences. reported here have the 29 amino-acid insertion Our results expand the bacterial members of this found in the SR1 and archaeal II/III form, while the form by four additional PER sequences, add the first closed WS6 genome (WS6-1) encodes a RubisCO three Dojkabacteria sequences and add a single-cell with a novel 48 amino-acid insertion (Figure 1, unpublished Microgenomates (OP11) sequence. Ori- Supplementary Figures S5 and S6). Both the 29 and ginally, two form II/III RubisCO subgroups were 48amino-acidinsertionsequencesarelocatedinthe described based on the absence (PER) or the same position as previously identified insertions. presence (SR1 and archaea) of a 29 amino-acid Similar to the methanogen sequences, the insertion N-terminal insert sequence (Wrighton et al., 2012; is sandwiched between catalytic sites of the enzyme Campbelletal.,2013;SupplementaryFiguresS5and (Supplementary Figure S5), yet impact of these S6). Here our additional sampling of the PER insertions on enzyme biochemical properties is RubisCO confirms this earlier discovery, as all PER currently unknown. TheISMEJournal RubicCOinuncultivatedbacteriaoftheCPR KCWrightonetal 2708 Figure 2 Conservation of RubisCO active-site residues. Alignment of active-site residues from representative RubisCO and RLP sequences as noted previouslyby Tabitaet al., 2007,alongwith equivalentresiduesfrom the new sequencesdescribedin this study. Residues are noted in single-letter IUPAC code. Coloring scheme is based on the identities of residues relative to the Synechococcus elongatus PCC 6301 reference sequence. Amino-acid numbering according to Synechococcus elongatus PCC 6301 reference sequence have been indicated. Green shading refers to conserved residues, yellow indicates semi-conserved residues and red represents non- conservedresidues.Catalytic(C)andRuBP-binding(R)residuesareindicatedonthetop;Xdenotesmissingresidue.TheRubisCOmotif referstoastretchofconservedresiduesthatareproximalintheprimarystructureandisidentifiedbyconsensusresidueidentitiesshaded gray.ResiduesK,DandEfromthismotifparticipateincatalysis. InadditiontotheformII/IIIsequences,nineofour insertions between alpha-helix 6 and beta-sheet 7 newly recovered bacterial sequences form a distinct (Figure 1, Supplementary Figures S5 and S6). The and strongly supported monophyletic group that WS6 genome containing divergent form III also branchesdeeplyfrom,yetismostsimilarto,theform contains the eight amino-acid insertion. To further III RubisCO sequences found in archaea. This confirm analyses indicating that these insertions currentCPRcladeincludesRubisCOsequencesfrom were not due to sequencing or assembly error, we Dojkabacteria (3), Parcubacterium (1) and Microge- designedspecificPCRprimersforthegeneencoding nomates (5). Although the extent of sequence for the large RubisCO subunit and confirmed the divergence between these bacterial sequences and insertions via cloning and sequencing. Prior the archaeal form III sequences is considerable, sequence analyses demonstrated that this region additional taxon sampling is needed to resolve this was involved in interactions with the neighboring cladeascompletelydistinctRubisCOform(Figure1, subunit (Satagopan et al., 2014); however, these Supplementary Figure S7). In our analyses, we have insertions have not been previously identified in identified other bacterial genomes that contain other RubisCO sequences. homologs to archaeal form III sequences in public databases (for example, from a partial single-cell Parcubacteria genome and in two Firmicutes draft CPR RubisCO is inferred to support a heterotrophic genomes; Supplementary Figure S7). Notably, these nucleoside pathway bacterialformIIIarenotmonophyleticwiththeCPR Previously studied archaeal form III (Finn and RubisCOs described here but clade with the canoni- Tabita, 2004; Sato et al., 2007; Estelmann et al., cal archaeal form III. 2011) and form II/III (Tabita et al., 2008; Alonso The parcubacterial sequence from the OD1-1 et al., 2009) RubisCOs are not associated with the genome contains two, an 8 and 24 amino acid, CBB pathway but instead function in a nucleoside- TheISMEJournal RubicCOinuncultivatedbacteriaoftheCPR KCWrightonetal 2709 salvaging pathway. Consistent with the archaeal RuBPbyanenzymeannotatedtofunctioninthiamine genomes that contain RubisCO, all bacterial CPR monophosphate formation (Figure 3, unique pathway genomes containing RubisCOs sampled to date lack denoted in yellow; Finn and Tabita, 2004) (that is, the gene encoding phosphoribulokinase, a key gene PRPP→ribose 1,5 bisphosphate→RuBP). Given that oftheCBBpathway(Wrightonetal.,2012;Campbell onlyonePeregrinibacteriumgenomecontainsadistant et al., 2013; Kantor et al., 2013). However, we note homologofthisRuBP-formingenzyme,andallofthese thesegenomescontainmanygenesthataretoonovel CPR were recovered from an ~14°C aquifer, this to be identified based on sequence similarity and pathwayseemslesslikelyinmostCPRbacteriawhose thus we cannot eliminate the possibility that they genomes encode RubisCO. However, we note many contain new enzymes or divergent phosphoribuloki- PER genomes encode an enzyme that can convert nasehomologs(Ashida etal.,2008).Consistentwith PRPP to AMP. This enzyme, adenine phosphoribosyl- our hypothesis that these RubisCO function in transferase, may represent an alternative mechanism archaeal nucleoside pathway, outside other CPR by which PER genomes can use PRPP via the AMP sequences, these RubisCO proteins have the closest pathway and not the pathway previously reported in homologs to similar archaeal and not bacterial thermophilic archaea (that is, PRPP→AMP→ribose RubisCOs (Supplementary Table S3). Thus we 1,5 bisphosphate →RuBP). Thus we cannot rule out suggest RubisCOs studied here also function in a that PRPP may also be used as a source for RuBP in nucleoside/nucleotide metabolism pathway analo- some members of the CPR via a T. kodakarensis-like gous to archaea. pathway (Figure 3). Rather than using phosphoribulokinase, there are Very recently, Aono et al. (2015) identified two proposed pathways to generate RuBP, the alternative enzymes involved in the nucleoside substrate for RubisCO in Archaea. In the first path- metabolism pathway in T. kodakarensis, linking way, identified in Thermococcus kodakarensis, nucleoside moieties to central carbon metabolism. nucleosides are converted into 3-PGA. The first step Instead of AMP phosphorylase (step 1 T. kodakar- in this short pathway dephosphorylates the nucleo- ensis pathway), the ribose moieties of adenosine, tide (AMP, CMP, UMP) to release the base (for guanosine and uridine are converted to ribose-1- example,adenine),yieldingribose-1,5-bisphosphate, bisphosphate (R1P) by three nucleoside phosphor- which an isomerase converts to ribulose-1,5-bispho- ylases. R1P is converted to ribose-1,5-bisphosphate sphate (RuBP) (that is, AMP→ribose 1,5 bispho- (R15P) via a nucleoside kinase (ADP-dependent sphate→RuBP). From RuBP and CO , the type III or ribose-1-phosphate kinase). R15P is likely directed 2 II/III archaeal RubisCO generate two molecules of 3- toglycolysisviatheR15PisomeraseandRubisCO.In PGA, a central intermediate of core carbon metabo- this metabolic context, using phosphate and ADP, lism (Aono et al., 2012). the ribose moieties of adenosine, guanosine and AllCPRgenomeswithRubisCOreportedhereand uridineareconvertedtoR15PviaR1Pbynucleoside studied previously possesses the two archaeal AMP phosphorylases and ADP-R1P kinase (Aono et al., pathway homologs (Figure 4), with the best hits to 2015) (Figure 3, blue dotted line). Finally, R15P can the AMP phosphorylase and the isomerase outside be directed to glycolysis via the functions of R15P the CPR are largely from archaeal genomes isomerase and RubisCO (Aono et al., 2015). We (Supplementary Table S3). For both proteins in this examined the CPR genomes for this alternative pathway,themostsimilarsequencesoutsidetheCPR T. kodakarensis pathway and found some (albeit are to homologs in archaeal genomes, often times to weak) support for this pathway in Microgenomates methanogens (PER) or to uncultivated archaeal and most PER genomes. Thus it cannot be ruled out lineages such as the DPANN (Castelle et al., 2015). thatsomeCPRbacteriacoulduseanalternativeroute We used an amino-acid alignment and protein through which R15P is generated from the phos- modeling to confirm that the CPR isomerase has all phorolysis of nucleosides (Figure 3). residues for catalytic activity and is specific for CPRorganismsthatcontainRubisCOareinferredto ribose1,5bisphophateandnotanothersubstrate,for be obligate fermenters (Supplementary Figure S10), example, 5-methylthioribose-1-phosphate isomerase consistent with prior metabolic analyses (Wrighton (Supplementary Figures S8 and S9). Based on the et al., 2012; Campbell et al., 2013; Kantor et al., genomic evidence to date, we propose that the CPR 2013; Wrighton et al., 2014). All of the genomes bacteria likely use a RubisCO pathway analogous to lack a complete tricarboxylic acid cycle, NADH that of T. kodakarensis (Figure 3, denoted in green). dehydrogenase (complex I) and most other com- IncontrasttoThermococcuspathway,otherthermo- plexes from the oxidative electron transport phos- philic archaeal (Methanocaldococcus jannaschii, phorylation chain (for example, complex II, Methanosarcina acetivorans and Archaeoglobus litho- complex III, complex IV and quinones). All CPR trophicus) use a second pathway. In this pathway, genomes included in this study, besides the ribose-phosphate pyrophosphokinase (PRPP) is sug- Dojkabacteria (WS6), have complete or near- gested to be desphophorylated to ribose 1,5 bispho- complete glycolysis and/or pentose phosphate sphateviaanabioticreactionthatisfacilitatedbyhigh pathway(s) and the capacity to produce acetate, temperatures. In archaea, the non-enzymatically pro- lactate and/or hydrogen as byproducts of fermenta- duced ribose 1,5-bisphosphate is then converted to tion.Notably,thePER,incontrasttobacteriaofthe TheISMEJournal RubicCOinuncultivatedbacteriaoftheCPR KCWrightonetal 2710 Glucose-6P Glucono-1,5-lactone-6P Gluconate-6P G6PD PGLS 6 1.1.1.49 6 3.1.1.31 6 GPI 5.3.1.9 hxlB arabino-3-Hexulose-6P hxlA HCHO CO2 5.3.1.27 6 4.1.2.43 Xylulose-5P Extracellular Fructose-6P tkt RPE RNA 6 2.2.1.1 5 5.1.3.1 5 Ribulose-5P ? Erythrose Sedoheptulose rpiA 5.3.1.6 AMP PRPP -4P -1,7P2 FBP 3.1.3.11 4 7 5 2.7.6.1 5 A2.P4.R2T.7 AMP Ribose-5P PRPP Fr1u,c6tPos2e- 6 tal 2.2.1.2 + ATP synthetase Adenine 2.2.1.1 tkt 7 Sedoheptulose-7P 2.4.2.57 AMPase ALDO 4.1.2.13 DAK DAS Glycerone-P 2.7.1.29 2.2.1.3 3 3 Pi 3 5.T3.I1M.1 GlyceroneHCHO 2.7.1.19 ? Adenine Phosphoribulokinase Glyceraldehyde-3P GAPDH 1.2.1.12 3 Glycerate-1,3P2 PGK 2.7.2.3 CO 2 RuBisCO Isomerase 3 4.1.1.39 5 5.3.1.29 5 Glycerate-3P Ribulose-1,5P2 Ribose-1,5P2 gpmI 5.4.2.12 ThiC 3 Glycerate-2P NAD+(?) ENO 4.2.1.11 3 Phosphoenolpyruvate PK 2.7.1.40 PFOR Acetyl-CoA PTA Acetyl-P ackA Acetate 3 1.7.1.3 3 2.3.1.8 3 2.7.2.1 3 Pyruvate ATP 6.2.1.13 ACS ATP Figure3 ProposedmetabolicroleofRubisCO,withgenesidentifiedinthePER-1(blueline)andWS6-1genomes(purpleline).Black lines indicate genes not identified in either organism. Thick blue arrows represent genes identified by metatranscriptomics (SupplementaryTableS4).YellowlinesdenotedtheproposedPRPPpathwayindicatedinthetext.Abbreviationsnotindicatedinthe text: G6PD, glucose-6-phosphate dehydrogenase; PGLS, 6-phosphogluconolactonase; hxlB, 6-phospho-3-hexuloisomerase; hxlA, 3- hexulose-6-phosphatesynthasePgm,phosphoglucomutase;GPI,glucose-6-phosphateisomerase;FBP,fructose-1,6-bisphosphatase;Aldo, fructose-bisphosphate aldolase; GAPDH, glyceraldehyde 3-phosphate dehydrogenase, Pgk, phosphoglycerate kinase; TIM, triosepho- sphate isomerase; GpmI, 1,3-bisphosphoglycerate-independent phosphoglycerate mutase; Eno, enolase; PK, pyruvate kinase; PFOR, pyruvate/2-oxoacid-ferredoxin oxidoreductase; ACS, Acetyl-CoA synthetase; PTA, phosphotransacetylase; ackA, acetate kinase; RpiA, ribose 5-phosphate isomerase A; RPE, ribulose-phosphate 3-epimerase; Tkt, transkelotase; Tal, transaldolase; APRT, adenine phosphoribosyltransferase; PRPP synthetase, ribose-phosphate pyrophosphokinase; DAK, glycerone kinase; DAS, dihydroxyacetone synthase;AMPase,AMPphosphorylase;isomerase,Ribose1,5-bisphosphateisomerase;NMP,nucleoside5′-monophosphate. other CPR genomes with RubisCO described here, for the other CPR genomes. Enzymes that were seemtohavethecapacitytosynthesizenucleotides identified include those required for the last steps andsomeaminoacids(SupplementaryFigureS11). of glycolysis from PGA to pyruvate and ultimately The metabolic information gleaned from the first fermentationtoacetateandATP.Notably,PGAisthe complete WS6 genome issparsecompared with that product of RuBP conversion by RubisCO (Figure 3). TheISMEJournal RubicCOinuncultivatedbacteriaoftheCPR KCWrightonetal 2711 We failed to identify genes for upper glycolysis, would minimize the energetic cost of upper glyco- gluconeogenesis and the pentose phosphate path- lysis, maximizing energy yields for these organisms way in the WS6 genomes (Supplementary such as WS6 with tiny genomes. FigureS10). Gene andproteinsequencesdiscussed inthismanuscriptcanbeaccessedthroughlinksin Supplementary Figures S10 and S11. This finding CPR RubisCO transcripts detected in the environment is similar to reports of bacteria of the SR1 phylum We examined the metatranscriptomic sequence (placement within the CPR is uncertain) that also data set collected from the 0.2-μm filter samples encode form II/III RubisCO and lack these path- for reads that mapped uniquely to PER-1 genome ways (Campbell et al., 2013; Kantor et al., 2013) (Supplementary Table S4), given that these organ- (Supplementary Figure S10). Complete and closed isms were enriched on this filter (there was insuffi- SR1 and WS6 genomes also lack pathways for cient biomass after metagenomics to investigate the biosynthesis of nucleotides (purines and pyrimi- 0.1-μm filters). In the sample collected prior to dines) and amino acids (Kantor et al., 2013). acetate amendment (sample A), we confirmed the Consistent with reports for SR1 and some small expression of RubisCO (Scaffold 3_gene102) and the CPRgenomes(Kantoretal.,2013;Gongetal.,2014; surrounding gene neighborhood (3_105, 3_107, Brown et al., 2015; He et al., 2015; Luef et al., 3_109) (Figure 4a). Notably, besides RubisCO, all 2015), we suggest that WS6 bacteria reported here of these transcribed genes were annotated as have an obligatory symbiotic or parasitic lifestyle hypothetical. Also in sample A, we detected tran- and are not free-living. scripts for step 1 of the AMP pathway (AMP It is possible that WS6 and other CPR that encode phosphorylase (Scaffold 5_gene7)) and energy genera- RubisCO may be reliant on other members of the tion via acetate production (phosphotransacetylase community for external ribose, which is utilized as (Scaffold 5_gene5)), as well as a poorly annotated carbon and energy sources. A bacterial or archaeal phosphoesterase (Scaffold 5_gene4) (Figure 4b). The cell is on average 20% by weight RNA, and RNA is expression of RubisCO, a key gene in the AMP 40%byweightribose,meaningthatacellisroughly pathway, and genes for acetate production from the 8%pureribose.Arecentpaperusedthisinformation same genome prior to stimulation suggests a role for to suggest that ribose was likely an abundant sugar the RubisCO and likely the AMP pathway in bacteria availableonearlyearthforfermentationandthatthe under natural aquifer conditions. archaeal type III RubisCO pathway of nucleoside monophosphate conversion to 3-PGA might be a relicofancientheterotrophy(Schönheitetal.,2015). The PER RubisCO can support autotrophic In a similar manner, we suggest that in WS6 the CO -dependent growth 2 ribosemoietycanbefunneledintoRuBPviathetwo We synthesized a recombinant protein in E. coli gene AMP pathway and converted to PGA via ofthePER-1formII/IIIRubisCOthatwastranscribed RubisCO to ultimately yield ATP and acetate via in situ. We demonstrated CO fixation activity 2 substrate-level phosphorylation. This use of ribose from the crude extract using RuBP as a substrate Pyrimidine metabolism Purine metabolism Hypothetical protein Acetate production Cell wall-related protein Peptidase Figure4 GeneneighborhoodsneartheformII/IIIRubisCO(a)aswellasAMPphosphorylaseandR15Pisomerase(b)fromthePER-1 genome.(a)RubisCO(red)clusterswithgenesforpyridmidineandpurinemetabolism.(b)Genesforproducingacetate(blue)areneartwo homologsfromtheAMPpathway(darkgreen).GeneclusterswereorganizedandvisualizedusingGeneiousv7.0.6. TheISMEJournal
Description: