5678–5691 NucleicAcidsResearch,2018,Vol.46,No.11 Publishedonline16May2018 doi:10.1093/nar/gky284 Methylation guide RNA evolution in archaea: structure, function and genomic organization of 110 / C D box sRNA families across six Pyrobaculum species Lauren M. Lui1, Andrew V. Uzilov1, David L. Bernick1, Andrea Corredor1, Todd M. Lowe1,* and Patrick P. Dennis2 D o 1DepartmentofBiomolecularEngineering,UniversityofCaliforniaSantaCruz,SantaCruz,CA95064,USAand w n 2DepartmentofBiology,WhitmanCollege,WallaWalla,WA99362,USA lo a d e d ReceivedNovember17,2017;RevisedApril03,2018;EditorialDecisionApril04,2018;AcceptedApril23,2018 fro m h ttp s ABSTRACT INTRODUCTION ://a c Archaeal homologs of eukaryotic C/D box small nu- In eukaryotic cells, ribosome assembly occurs in the nu- ad e cleolar RNAs (C/D box sRNAs) guide precise 2(cid:2)-O- cleolus, a specialized structure located within the nucleus. m methylmodificationofribosomalandtransferRNAs. At this site, ribosomal RNA (rRNA) is transcribed, mod- ic.o AlthoughC/DboxsRNAgenesconstituteoneofthe imfieadl,pprorotecienssseidn,tofotlhdeedlaargnedaansdsesmmbalelldriabloonsogmwailthsurbibuonsitos-. up.co largest RNA gene families in archaeal thermophiles, m mtaotisotngbeencoamuesse hrealviaebilnec,ofmulplyleateutsoRmNaAtedgedneeteacntnioon- T(shneoRnuNcAleos)luthsaatlsaorecorenqtauiinresdafloarrgtheenmumodbiefircoaftisomnaalnldRNmAats- /nar/a methods are not available. We expanded and cu- urationofrRNA,andimplicatedaschaperonesinfolding rtic (reviewed in (1)). These snoRNAs are incorporated into le rated a comprehensive gene set across six species dynamic ribonucleoprotein (RNP) complexes that act as -ab ofthecrenarchaealgenusPyrobaculum,particularly molecularprocessingmachinesalongtheribosomeassem- stra richinC/DboxsRNAgenes.Usinghigh-throughput blyline.MostsnoRNAscontainguidesequencesthatbase ct/4 small RNA sequencing, specialized computational pairwithrRNA,facilitatingprecisemodificationofribonu- 6 /1 searches and comparative genomics, we analyzed cleotideswithintheregionofcomplementarity.ThesnoR- 1 526 PyrobaculumC/D box sRNAs, organizing them NAs divide into two classes: C/D box snoRNAs, which /567 guide 2(cid:2)-O-methylation of ribose and H/ACA box snoR- 8 into110familiesbasedonsyntenyandconservation /4 NAs, which guide the conversion of uridine to pseudouri- 9 ofguidesequenceswhichdeterminemethylationtar- 96 dine(2,3).Althougharchaealcellsdonotcontainanorga- 5 gets.Weexaminedgeneduplicationsandrearrange- nizednucleolarstructure,theypossessandutilizebothC/D 77 ments, including one family that has expanded in boxandH/ACAboxsno-likeRNAs(sRNAs)inthemod- by a pattern similar to retrotransposed repetitive ele- ificationofrRNAandassemblyofribosomalsubunits(re- gu e mentsineukaryotes.Newtrainingdataandinclusion viewedin(1,4)).Notably,bacteriadonotuseRNA-guided st o of kink-turn secondary structural features enabled modification,soarchaeahavebecomethemostconvenient, n 0 creationofanimprovedsearchmodel.Ouranalyses minimally complex models for studying this innovation in 2 A provide the most comprehensive, dynamic view of enzymaticflexibility. p C/D box sRNA evolutionary history within a genus, Archaeal C/D box sRNAs are generally about 50 nu- ril 2 0 in terms of modification function, feature plasticity, cleotides(nt)inlengthandcontainhighlyconservedC(RU- 19 GAUGA consensus) and D (CUGA consensus) box se- andgenemobility. quencesatthe5(cid:2) and3(cid:2) endsofthemolecule,andlesscon- served versions (designated C(cid:2) and D(cid:2)) near the center of themolecule(5).TheseRNAsfoldintoahairpinasaresult of the formation of a kink-turn (K-turn) structural motif *Towhomcorrespondenceshouldbeaddressed.Tel:+18314591511;Fax:+18314594482;Email:[email protected] Presentaddresses: LaurenLui,EnvironmentalGenomicsandSystemsBiologyDivision,LawrenceBerkeleyNationalLaboratory,Berkeley,CA94702,USA. AndrewUzilov,DepartmentofGeneticsandGenomicSciences,IcahnInstituteforGenomicsandMultiscaleBiology,IcahnSchoolofMedicineatMountSinai, NewYork,NY10029,USA. (cid:3)C TheAuthor(s)2018.PublishedbyOxfordUniversityPressonbehalfofNucleicAcidsResearch. ThisisanOpenAccessarticledistributedunderthetermsoftheCreativeCommonsAttributionNon-CommercialLicense (http://creativecommons.org/licenses/by-nc/4.0/),whichpermitsnon-commercialre-use,distribution,andreproductioninanymedium,providedtheoriginalwork isproperlycited.Forcommercialre-use,[email protected] NucleicAcidsResearch,2018,Vol.46,No.11 5679 throughtheinteractionoftheCandDboxsequencesand aligned and curatedthemintohomologous familiesbased aK-loopmotifthroughtheinteractionoftheD(cid:2)andC(cid:2)box on sequence similarity and methylation target prediction. sequences(Figure1A).TheK-turnandtheK-loopareeach Finally, we used this extensive dataset to make a range of recognized by the protein L7Ae (6). The binding of L7Ae new observations regarding sRNA evolution between the stabilizestheRNAstructureandallowstwocopieseachof sixPyrobaculumgenomesbasedon(i)variationinsequence Nop56/58(alsocalledNop5)andfibrillarintobind,com- features within homologous sRNA families, (ii) inferred pleting the assembly of the active RNP complex (7). The conservation of function in terms of rRNA modification fibrillarin protein is an S-adenosyl methionine-dependent and ribosome assembly and (iii) sRNA gene context and RNAmethylase,andisresponsibleforthecatalyticactivity plasticity(mobility,duplication,flankinggeneorientation). oftheRNPcomplex. This new reference set also enabled us to create and test The two guide regions between the C and D(cid:2) boxes and anew,highlysensitivecomputationalsearchmodelforar- between the C(cid:2) and D boxes are unstructured and each is chaealC/DboxsRNAs. availabletobasepairwithan∼8–12ntlongtargetsequence D o (Figure1A).InadditiontorRNAtargets,asignificantpro- w n portionofarchaealsRNAshaveguideregionsthatarecom- MATERIALSANDMETHODS lo a plementary to transfer RNA (tRNA) (7,8). Methyl modi- Growth conditions of Pyrobaculum oguniense and RNA ex- de d fication in the target RNA occurs at the nucleotide posi- traction fro tionthatbasepairswiththeguide5ntupstreamfromthe m firstbaseoftheD(cid:2)orDboxsequence.Thisisknownasthe Pdiytiroonbascausludmesocrgiubneidenpsreevcuioltuusrlyes(1u9n)d.eBrrmieiflcyr,o-cauelrtuorbeiscwcoenre- http ‘N+five’ruleandmethylationtargetsarereferredtoasthe s D and D(cid:2) targets (2). Many C/D box sRNAs with canon- growninmodifiedDSM390medium,using1gtryptone,1g ://a ical box features have guide regions that lack complemen- ydeearsmteixcrtora-acte,rpoHbic7,csounpdpitleiomnesnatted90w◦iCth.1R0NmAmwNaas2eSx2tOra3cutend- cade tarity to rRNA and tRNA sequences; these are known as m ‘orphan’ guides and may target other RNAs, but to date fromcellpelletsusingTRIzolreagentandpurifiedusingan ic.o no conserved targets to other RNAs have been identified ethanolprecipitationfollowedbya70%ethanolwash. up withintheArchaea(1). .co m TheevolutionofC/DboxsRNAsaffectsribosomefunc- SmallRNAsequencingandreadprocessingofPyrobaculum /n tion and the host genome. In all domains of life, ribose species ar/a mofetethnyfloatuinodnsinhefulpncsttiaobnialilzlye iRmNpAortsatnrutcrteugrieonasnodfathree rmiboos-t SmallRNAsequencingdataforPae,Par,PcaandPiswere rticle some, such as the peptidyl transferase center in domain V minedfromapreviousstudyfromourlab(18).Thelibraries -ab of the large ribosomal subunit (9). Although elimination for these four species were sequenced on the Roche/454 stra toifoninsdaipvipdeuaarlt2o(cid:2)-hOav-meelitthtylelaetfifoencstobnytCh/eDcelbl,oxglosRbaNlAdydsreelge-- GquSenFcLedXbsyeqthueenUceCr.DSmavaisllSReqNuAenlcibinrgarFieascifloitryPoongIwlleurmeisnea- ct/46 ulationofthemethyltransferasefibrillarinhasprofoundef- HiSeq 2000 to produce 2 × 75 nt paired-end sequencing /11 fects, possibly including cancer in humans (10,11). In ad- reads. Sample preparation of small RNA libraries for Pog /56 7 dition to their role in ribose methylation, the propagation are described in (18,23). Briefly, the small RNA size frac- 8/4 of C/D box sRNA genes may have an impact on the evo- tion was isolated by running total RNA in denaturing gel 99 lution and architecture of the genome. In mammals and electrophoresis and extracting the region below tRNAs. 65 7 nematodes,someC/DboxsRNAsappeartoduplicatevia Readswithbarcodesandlinkersremovedweremappedto 7 b a retrotransposon-like mechanism (12–14). These duplica- genomesusingBLAT(24).TheresultingPSLfilewaspro- y g tions may lead to new functions of the sRNAs, and, like cessed to determine paired reads. Supporting small RNA- ue othertransposons,playaroleingenomeevolution(15). seqdatacanbeinteractivelyvisualizedusingtheUCSCAr- st o Although other studies have detected C/D box sRNA chaeal Genome Browser and a companion track hub (in- n 0 geneduplicationandinstancesoftheiroverlapwithprotein- structions provided at http://lowelab.ucsc.edu/snoRNAdb/ 2 A codinggenes(16–18),nonehavebeencomprehensivewith TrackHubs.html). p theintentofunderstandingsRNAevolutionandfunction ril 2 0 within a genus. The Pyrobaculum genus is diverse yet pro- 1 Computational prediction and organization of C/D box 9 videsanidealgeneticdistancewhereorthologyandsynteny sRNAhomologfamilies ofsRNAgenescanstillbeclearlyestablished.Wesupported C/D box sRNA gene predictions with a combination of TogenerateacompleteornearlycompletesetofsRNAgene comparativegenomicsandsmallRNAsequencing(RNA- predictionswithinthegenusPyrobaculum(Supplementary seq)datafromfivePyrobaculumspecies(18,19):Pyrobacu- Table S1), we used computational covariance models and lumaerophilum(Pae),Pyrobaculumarsenaticum(Par),Py- smallRNAsequencingdataofPae,Par,Pca,PisandPog robaculumcalidifontis(Pca),Pyrobaculumislandicum(Pis) (data reported in (18) except for Pog). In a previous study andPyrobaculum oguniense (Pog).Inaddition,thespecies where we used small RNA-seq data from four species of Pyrobaculum neutrophilum (Pne; formerly Thermoproteus Pyrobaculum,weidentifiedseveralunannotatedtranscripts neutrophilus(20))wasusedtosupplementcomparativege- thatwerelikelytobeconservedC/DboxsRNAswithbox nomics analyses. After identifying a reference set of 526 features(specificallytheK-turnmotifformedbyboxbase high-confidence C/D box sRNA genes, the most compre- pairing) that were divergent from the canonical C/D box hensive set in any archaeal genus to date (16,21,22), we model (18). Therefore, we developed a covariance model 5680 NucleicAcidsResearch,2018,Vol.46,No.11 D o w n lo a d e d fro m h ttp s ://a c a d e m ic .o u p Figure1. OrganizationofPyrobaculumC/DboxRNAsinto110homologousfamilies.(A)ThetypicalstructureofanarchaealC/DboxsRNAisdepicted. .c o ThestructurecontainstwoK-motifs,theK-turnformedbytheinteractionbetweentheCandDboxsequencesandtheK-loopformedbytheinteraction m betweentheD(cid:2) andC(cid:2) motifs(blackdashedlines).ThetwoguideregionslocatedrespectivelybetweentheD(cid:2) andCboxesandbetweentheDandC(cid:2) /n a b5oofnxPtesysru(ogpbrsaetcreeunal)um,mbfairssoeminpdatiihcreawtseitdtah.rtSthpoeefcttiahersegaeDtre(cid:2)RooNrrdADer(ebodoraxbnasgseeeq)duaeonnndceam.pTehthhyiylsolaigsteionthneeti(cy‘Netlr+leofiewvdehe’terexuramlge.oinn(eB)d)obcTcyhue1r6snSiunmrRtbhNeerAtaorfaglieidgtennnmuticefilneetodt(i1sdR9e)N.thTAahsteirbnmaseoeapcrphoatieorusfswthtieetnhasixtxh(esTpgteuec)iideiess r/artic includedasanoutgroup.(C)C/DboxsRNAswereorganizedinto110homologousfamiliesbasedonsequencesimilarityoftheguidesandpredicted le-a targetsinrRNAandtRNAs.C/DboxsRNAnumbersindicatetowhichfamilyeachbelongs.Thus,PaesR01,ParsR01,etc.belongtothesR01family. b s C/DboxsRNAswerefirstgroupedintofamiliesusingtheoriginalannotationnumberinginPae(1–65)(25).AllotherC/DboxsRNAsweregrouped tra intofamiliesstartingatnumber100.ThemajorityofsRNAsfallintofamilieswithrepresentativesinallsixspecies. ct/4 6 /1 thataccommodatesorphanguidesandincorporatesboxse- We used Infernal v1.0 (26) to pick up 0-5 more predic- 1/5 quences,K-turnandK-loopstructure,andlengthofspac- tionsperspeciessinceitisslightlymoresensitivethanInfer- 67 8 ers (guides and variable loop) (Figure 1A). The two G:A nalv1.1(27)(SupplementaryTableS2).Itispossibletoin- /4 9 pairsoftheK-turnandK-loop,wereannotatedinthestruc- creasethesensitivityofInfernalv1.1usingthe–maxoption, 9 6 tural alignment used as input for the model. The variable butthenumberofcandidatestoevaluatemanuallybecomes 57 spacerregionsofarchaealC/DboxsRNAs,theguidesand prohibitiveasthenumberoffalsepositivescanincreaseinto 7 b y variableloop,wereannotatedtobeanybase.Thelengthof the hundreds or thousands. Infernal 1.0 and 1.1 produce g u these spacer regions in the model is based on the longest different subsets of candidates; we used both for the final e s observedlengthinthetrainingset. prediction set and manually curated the sRNAs that were t o n Theinitialcovariancemodelwascreatedbyusingahand- predictedbyonlyoneortheother. 0 curated multiple structural alignment of the 62 Pae C/D To improve specificity, candidates from genome scans 2 A p bsuobxseRqNueAnstcreopmoprtuetdatiinontahleagneanlyosmise(1s6e,q2u5e)n.Tcihnegsepaseprevredanads tlahpapteodveGrleanpbpaendkoRtheefrSaenqnporotatteeind-ncoond-incogdgienngeRsbNyAmsoorreotvhearn- ril 2 0 1 input to cmbuild from the Infernal v1.0 and v1.1 software 80% were discarded, unless they were supported by syn- 9 packages(26,27)withthehand-curatedoption–rfor–hand tenic ortholog predictions in other Pyrobaculum genomes. specified,respectively.Thecovariancemodelwascalibrated The model predicts sRNA box features; these were manu- withcmcalibrate.Afinalcovariancemodelwasbuiltfroma allycheckedandadjustedwhenrequired. completesetofPaesRNAsfoundfromexaminingsequenc- To find additional orthologs of sRNAs genes within ing data and using comparative genomics with the other the Pyrobaculum genomes not found by the covariance Pyrobaculum species (Supplementary File S1). This high- model,thegenomesweresearchedusingsRNAsequences qualitytrainingsetonlycontainsC/DboxRNAsthatare as queries to BLASTN (28). Genome Genbank/INSDC eitherconservedwithinthePyrobaculumgenusorhavecon- numbers are AE009441.1 (Pae), CP000660.1 (Par), firmingsmallRNA-seqdata.Weusedcmsearchtoscanthe CP000561.1 (Pca), CP000504.1 (Pis), CP001014.1 (Pne) genomesandsmallRNAsequencingdatausingtheglocal and CP003316.1 (Pog). Top hits were manually curated, (-g)andnoHMMfilter(–nohmm)options. based on predicted promoters, conservation and sequenc- ing evidence. Families of C/D box sRNA homologs were NucleicAcidsResearch,2018,Vol.46,No.11 5681 created based on sequence similarity of guide sequences RESULTS and predicted target sites of modification. The first 62 IdentificationandcurationofPyrobaculumC/DboxsRNA familieswereassignednumbers(1–65)basedontheprevi- genes ously reported Pae sRNA numbering (18). Pae sR10 was renamed Pae sR57a to reflect homology to Pae sR57 and Weidentified526C/DboxsRNAgenesfromsixspeciesof thesR57family.PaesR41 wasremovedfromannotations Pyrobaculum using evidence from (i) RNA-seq data from becauseofalackofsequencingreadsandnorecognizable Pae, Par, Pca, Pis and Pog, (ii) an improved computa- C(cid:2) or D(cid:2) boxes. Pae sR55 was also removed from further tionalcovariancepredictionmodeland(iii)comparativege- analysisbecauseitdoesnothavearecognizableDboxand nomics.NearlyallofthesRNAsfromthefivegenomeswith has no predicted orthologs in the six other Pyrobaculum RNA-seqdata(436/442,99%)arerepresentedbyreadsin genomesinthisstudy.Additionalfamiliescontainingnewly the RNA-seq libraries, and most were conserved in Pne. identifiedsRNAswerenumbered100to141. Thenewcovariancemodeldevelopedinthisstudyhadtwo The sequences, family organization and genomic loca- advantagesoverpriorsearchmethods:(i)incorporationof D o tion of these sRNAs can be found in Supplementary Ta- K-turn and K-loop structural information, and (ii) detec- w n ble S1, at the Lowe Lab Archaeal snoRNA-like C/D box tionofaguide–targetinteractionisnotnecessarytoiden- lo a RNA Database (http://lowelab.ucsc.edu/snoRNAdb/) and tify candidates, allowing prediction of sRNAs where both de d (29). UCSC Archaeal Genome Browser track hubs (30) guideslackcomplementaritytorRNAortRNAsequences fro enabling the visualization of the annotated small RNAs (1,31). m can also be found at http://lowelab.ucsc.edu/snoRNAdb/ http TrackHubs.html. s ://a c Most Pyrobaculum C/D box sRNA homolog families have ad e members in all six species. The 526 Pyrobaculum sRNA m Predictionofmethylationtargets glieensebsawseedreoonrgseaqnuizeendceinctoon1s1er0vdatifiofenreonftthhoemirogluoigdoeursegfaiomnis- ic.oup Toidentifytheputativesitesof2(cid:2)-O-methylationguidedby and predicted targets of methylation in tRNA and rRNA .co m Pyrobaculum C/DboxsRNAs,wescannedmaturerRNA (Supplementary Table S1). We found 26 previously unde- /n waDnedraentRdobNDtaA(cid:2)ignsueedqiduebesynocaefsgtlhfooebrsarRelgaNiloiAgnnss.moMfencaottumorfpeltrehRmeNesniAxtaPsreiytqyruotbeonatcchuees- Ptbeeccrate,odtfhsdrReeetNeicnAtePsdiinsC,tf/hoDiusrbsitonuxdPysoRg(tNawnAodignteePnnaeisen,iPnthnirene)de;itivhniedPutaoartl,asflopnueucrmiiens- ar/article lum rRNAs and removal of predicted introns. Intron sites ranges between 84 and 92 (Figure 1B). Grouping the sR- -ab are indicated in Supplementary Figures S1 and 2. A uni- NAsintofamiliesallowedustotracktheevolutionofC/D stra formnumberingsystemforsitesofrRNAmethylationwas box sRNA genes (gain, loss, methylation target changes) ct/4 obtainedbyconstructinganalignmentofthe16Sand23S within the genus and to predict target sites of methylation 6 /1 rRNAsequencessharedacrossallsixspecies(Supplemen- withinrRNAandtRNAwithgreatercertainty. 1 /5 tary Figures S1 and 2). The predicted locations of modifi- Mostofthehomologousfamiliesareconserved,with70 6 7 cation were mapped on the alignment and assigned a po- ofthe110families(64%)havingrepresentativesRNAsen- 8/4 sition based on the Pae rRNA numbering. For a predic- codedineachofthesixPyrobaculumgenomes.The40re- 99 6 tion to be considered credible, generally a minimum com- maining families have representatives missing from one or 5 7 plementarity of nine continuous Watson–Crick base pairs moreofthesixgenomes(Figure1C).Eighteenofthefami- 7 b centeringatornearthe‘N+five’positionwasrequired.The liesareuniquewiththerepresentativepresentinonlyasin- y g criteria were relaxed in two specific instances. First, if the gle species. Each of these 18 singleton sRNAs have small ue s majorityofmembersinansRNAgroupmettheprediction RNA-seqreadsand15haveatleastonepredictedtargetto t o criteria,thepredictionwasextendedtominoritymembers rRNA or tRNA. Within a family, it is common for both n 0 thatnearlymetthecriteria(forexample,matchescontain- guideregionstoexhibitahighdegreeofsequencesimilar- 2 A ingamismatchorG:Ubasepair).Second,ithasbeennoted ityindicativeofacommonancestry.Forexample,ofthe70 p that many sRNAs use their two guide regions to direct families that have representative sRNAs in all six species, ril 2 0 methylationstocloselyspacednucleotideswithinthetarget 62 exhibit a recognizable degree of sequence similarity in 1 9 RNA (16). Presumably, this enhances target identification both the D and the D(cid:2) guide regions among all members, and creates greater stabilization of the guide–target inter- whereas the remaining eight families have a conserved se- action within the RNP complex. Consequently, when one quence across all species in only one of the two guide re- guideexhibitsstrongcomplementaritytothetarget,thecri- gions (see Supplementary Table S1). Even when a partic- terionforthesecondguidematchisrelaxedif(i)itiswithin ular guide region is conserved, it is frequently punctuated 100ntofthefirstcomplementarity,(ii)theweakercomple- byveryshort(1–3nt)insertions,deletionsorsubstitutions, mentaritycontainsnomorethanonemismatchand(iii)the primarilyatthe5(cid:2) or3(cid:2) endoftheguidesthatarenotpre- combinedbitscoreforthetwocomplementaritieswas32or dicted to form part of the guide–target base pairing inter- higher (where a Watson–Crick base pair is 2, a G:U base action. Accordingly, not a single guide region is perfectly pair is 1 and a mismatch is −2). This empirically derived conserved in any of the 70 families with representatives in rule-basedmethodisappliedwiththePythonprogramfin- allsixspecies,emphasizingthatthesespecieshavediverged dAntisense.py.Descriptionoftheprogramandrelatedfiles enoughtoobservedifferencesinselectivepressurebetween canbefoundathttps://github.com/lmlui/findAntisense. functionalandnon-functionalsegmentsofsRNAs. 5682 NucleicAcidsResearch,2018,Vol.46,No.11 New computational model based on curated Pyrobaculum C/D box sRNAs. One of the important aspects of our detailed curation of archaeal sRNAs was the ability to generate a gold-standard set for training a computational model. After training the model with a curated alignment of Pae sRNA genes (see ‘Materials and Methods’ section) andobtainingscoredistributionsoftruepositivesandfalse positives,wedeterminedahighconfidencethresholdof17 bits(bestspecificity)andamoderateconfidencethreshold of13bits(highsensitivitywithsomelossofspecificity)for archaealC/DboxsRNApredictions(SupplementaryTable S2andFigureS3). Totesthowwellthismodelworksondivergentarchaea, D o weusedittosearchthreespeciesintheeuryarchaealgenus w n Pyrococcus: Pyrococcus abyssi (Pab), Pyrococcus furiosus lo a (Pfu)andPyrococcushorikoshii(Pho).Thegenuscontains de d mstuadnyyinsgRNCA/DgebnoexpsRreNdiActiostnrsucatnudrehaansdbfeuennctaiomno(d21el,2f2o)r. from WefoundsevenC/DboxsRNAsamongthesespecies(two http in Pab, four in Pfu and one in Pho, Supplementary Table s S3). These predictions are conserved with other Pyrococ- ://a c cussRNAsorhavesmallRNAsequencingevidence(Sup- ad e plementary Figure S4). We noted that on average the Py- m rococcus sRNAs have bitscores that are two points higher ic.o thanthePyrobaculumsRNAs,whichissurprisingsincethe up modelwastrainedonPyrobaculumsequences(Supplemen- .co m tary Figure S3A,C). Upon further analysis, we found that /n Pyrococcus C/D box sRNAs score higher than Pyrobacu- ar/a lumsRNAsbecausetheyhavelessvariationintheirboxse- rtic quencesandmorecanonicalK-turnscomparedtothePy- le robaculum(Figure2).Thelargersequencevariationinthe -ab s fPoyrArsopcbpaanrconuxilniummgao,thteholyewr3ea–vr1ecr1h,%ameaoay.fmknaokwetnhiCs/mDodbeolxmsoRrNesAengseitnivees aKFri-cgthuuarreenas2l.aCnC/doDKmb-ploaoxroispsRos.nN(AoAf).DPInyiarKogbr-ataumcrunloufsmtKuda-tniuedsr,nPthayenrodbcaoKsce-cluopsoaipCrs/pDaorseibtnioouxnmssbRineNraeAnd tract/46/1 in the Pyrobaculum and Pyrococcus genomes are not pre- startingwiththeG:Abasepairsandthefirstletterinthepairisfromthe 1/5 dicted with this covariance model. We examined the false Cbox(53,54).ThestrandwiththebulgeandtheCboxisreferredtoas 6 7 nfeeagtautrievseswaenredufnouusnudalt,hoaftteinnrmesouslttincagsiensnoonne-coarnmonoircealbKox- ‘t2bhb,e’2nafinrbsdetittnhwgeoAnp:oGons.-i(tbBiuo)lngSseedaqruseetrtnyacpneidcloawglloiytshpootfhsteihtieDobnob1xobxf1einastbureerefinesrgorefGdth:AteoPaaynsrdo‘nbp.ao’csTuitlhiuoumns 8/4996 motifs (see Supplementary Figure S3E for discussion). To 5 andPyrococcusgenera.CreatedwithRILogo(55).(C)Representationof 7 capturetheseunusualC/DboxsRNAs,anothermodelmay thevariabilityofbasepairtype(16possible)foundateachpositionofthe 7 b beneeded. K-turnorK-loop.Darkerboxesindicatethatthepositionismorevariable. y g u e s PredictionandanalysisofC/DboxsRNAmethylationtar- t o gets rRNA secondary structure in order to visualize clustering n 0 patterns (Supplementary Figures S5 and 6). As noted for 2 A Methylation targets in rRNA and tRNA were predicted other species, methylation sites cluster within functionally p using the ‘N+five’ rule (2) (Figure 1A), and hits were important regions, such as the peptidyl transferase center ril 2 0 rankedbasedonextendedcomplementaritybetweenguide andhelix69of23SrRNA.Comparisonswithpositionsof 1 9 sequencesandtargetRNAs(SupplementaryTablesS4and predicted modification in species outside of Pyrobaculum 5).Usingcriteriadescribedin‘MaterialsandMethods’,we indicatethattheprecisesitesofmodificationare,withafew wereabletopredictatleastonemethylationtargetfor89% notable exceptions, generally not conserved, although the (468/526) of the sRNAs. Across all possible guide regions clusteringpatternisconserved(17). (two per sRNA), 56% (587/1052) were predicted to medi- Notably,∼45%ofsRNAs(237/526)usetheirDandD(cid:2) aterRNAmethylationand16%(173/1052)werepredicted guidestotargetsitesthatarewithin100ntofeachotherin to guide tRNA methylation. Using this large set of genes theprimaryrRNAsequence(SupplementaryFiguresS1,2, and predicted target methylation sites, we describe results 5,6andTablesS4and5).Theseobservationsfurthersup- yieldingevolutionaryandfunctionalinsightsbelow. port prior studies that have suggested dual guide interac- tions may play an important role in mediating the folding AnalysisoftargetsinribosomalRNAsupporttheroleofsR- andstabilizationofnascentrRNAsandtheirassemblyinto NAs as rRNA folding chaperones. Positions of predicted ribosomalsubunits(17,32),particularlyinthermophiles.A methyl modification were mapped onto the 16S and 23S computationalstudysimulatingC/DboxsRNAchaperone NucleicAcidsResearch,2018,Vol.46,No.11 5683 function in rRNA folding also suggests that double guide example, in the sR33 family, the D and D(cid:2) guides target sRNAsmaybeespeciallyimportantforproperlong-range closely spaced positions in 23S rRNA in all six Pyrobacu- interactionsinrRNAs(33). lumspecies(Figure3).TheD(cid:2) guideofallmembersispre- Three sRNAs (sR2, sR53, sR56), conserved in all six dictedtobeincapableofmethylationatC2045becauseofa species,haveDandD(cid:2) guidesthathavecomplementarities C:Umismatch.TheD(cid:2) guide–targetinteractioniscredible andpredictedmethylationtargetsthataremorethan100nt becauseofitsstrongconservationanditscloseproximityto apartintheprimaryrRNAsequencebutarecloseinthesec- theDguideinteraction. ondary structure (Supplementary Figure S7). These long- In three cases, only one member of a family has a mis- rangeinteractionsoccurinthe16Stranslationalfidelityand matched base pair (Pis sR106:A608, Par sR09:U912 and centralcoreregions,whichareimportantforthefunctionof Pae sR44:C2117, all in 23S). For example, the sR09 fam- the ribosome (see Supplementary Figure S7 for more dis- ily has five members and the D and D(cid:2) guides are highly cussion).Wesuspectthattheselong-rangeinteractionsalso conserved, yet the Par sR09 D guide contains an A-to-U playanimportantroleinthetertiaryfoldingofrRNAdur- nucleotide substitution at the ‘N+five’ position, changing D o ingtheassemblyprocess. the guide–target interaction to U:U at this position (Sup- w n plementaryFigureS9).Theseguide–targetinteractionsmay lo a One-third of Pyrobaculum C/D box sRNAs target tRNAs. still play an important role in the localized folding of the de IthasbeenshownpreviouslythatarchaealC/DboxRNAs 23S rRNA in each species, even if one guide occasionally d fro can target modification to tRNAs as well as rRNAs (23). contains a mismatch impairing methylation (Supplemen- m ThetRNAmethylationtargetsareatstructurallyconserved taryFigureS9A). http positions that are modified by protein-only tRNA methy- s lasesinotherorganisms(16,34).Inourcollectionof110Py- ://a c robaculumsRNAfamilies,32arepredictedtotargetmodi- GuideswithnopredictedtargetsintRNAorrRNA(orphan ad e ficationto23differentpositionsinvarioustRNAs(Supple- guides). In the 526 different Pyrobaculum sRNAs, 28% m mentaryTablesS4and5).OfthesetRNA-targetingsRNAs, of guides (292/1052) show no significant complementar- ic.o about 80% (115/144) have one guide that targets a tRNA itytoeitherrRNAortRNAsequences(seeSupplementary up andtheotherguidehasnotarget.Dual-guidesRNAswith Tables S4 and 5). We also searched for mRNA and other .co m potential to target different sites in the same tRNA were non-coding RNA targets for these orphan guides, but no /n rare(sR46,sR61,sR123,sR126;SupplementaryTablesS4 significant,conservedcomplementaritieswereobserved,in- ar/a and5),incontrasttothemanydual-guidesRNAstargeting cludingorphanguidesconservedacrossallsixPyrobaculum rtic rRNA. species. le The number of different tRNAs that could be targeted Insomeinstances,orphanguidesappeartobetheresult -ab s by a particular sRNA guide varies over a wide range ofasmallnumberofnucleotidesubstitutions.Forexample, tra aunndiqureeflewchtsertehaes foatchterthsaatrseomshearseedquaemncoensginmtaRnNyAdsiffaerre- idnictthede stRo3ta0rgfaemtCily2,72th4einD2g3uSidrReNofAa,llwshixermeaesmthbeerDs(cid:2)isgupirdee- ct/46 /1 ent tRNA isoacceptors (Supplementary Figure S8). The is predicted to target C2708 in only four of the members 1 /5 region surrounding position 34, the wobble base in the (SupplementaryTablesS1and4).ThetwosRNAscontain- 6 7 anticodon, is an example of a fairly unique target se- ingthedivergedD(cid:2) guides(inthePogandParsub-lineage) 8/4 quence. Guides from four different sRNAs target posi- contain just two changes at the beginning of the guide re- 99 6 tion C34 or U34, and each has only a single tRNA target gion, dropping the guide–target pairing interaction to be- 5 7 (sR26:C34Trp,sR27:U34Gln,sR45:C34Val,sR46:U34Thr lowtheminimumthresholdof8bp.These‘orphanguides’ 7 b and sR51:C24Glu). Other guides have multiple potential thatarepartofancestraldualguidesRNAsmayinfactstill y g tRNAtargets;forexample,theDguideofPaesR64exhibits help mediate rRNA folding, albeit with a weaker interac- ue s complementaritytoaconservedsequenceinsixteendiffer- tion. t o enttRNAfamiliesanddirectsmodificationtopositionG51 Guide divergencealso appears tooccur via genomic ar- n 0 intheT(cid:2)Cstem. rangementsorsRNAduplicationthatresultinanoverlap 2 A between an sRNA gene and a protein-coding gene. Of the p Instancesofmismatchedbasepairsatthe‘N+five’positionof sRNAs that overlap the 5(cid:2)- or 3(cid:2)-end of a protein-coding ril 2 0 methylation. Invivoandinvitrostudieshavedemonstrated geneinthesenseorientation,88%and61%oftherespective 1 9 that a Watson–Crick base pair at the ‘N+five’ position in overlappingguidesdonothavepredictedtargetsinrRNA theguide–targetbasepairingregionisessentialformethyla- ortRNA(seelatersectionsformorediscussion). tionofthetargetRNA(7,35–36).Nonetheless,amismatch Theoriginofotherorphanguidesislessclear.Thereare at the site of methylation within a conserved guide–target afewinstanceswhereguidefamiliesareconservedbutonly complementarityimpliesthattheinteractionmaybebene- oneofthemembershastargets.Forexample,bothPaesR64 ficialbutthatthemodificationiseithernotneededorharm- and Pae sR101 have tRNA targets, but the five other ho- fultothefunctionofthetargetRNA.Studiesinyeastthat mologs in their families are dual orphan guides (Supple- usecross-linkingtodetectRNA–RNAinteractionssupport mentaryTablesS4and5).Theseguidefamiliesarerelatively thishypothesis(37,38). wellconservedwithonlypointmutations,anditisunclear InfourinstancesinthePyrobaculumsRNAset,wefind iftheseareinstancesofgain-orloss-of-targetingfunction. a conserved mismatch at the ‘N+five’ methylation posi- There are only three families (sR43, sR50 and sR108) in tion (sR116:U109 and sR25:C1368 in 16S; sR56:G764, which all six members have dual orphan guides. Interest- sR33:C2045 in 23S; Supplementary Tables S4 and 5). For inglyforsR43,bothguidesarehighlyconservedamongall 5684 NucleicAcidsResearch,2018,Vol.46,No.11 A sR33 Guide Family C box D’ guide D’ box C’ box D guide D box Mismatch Pae sR33 GUGAUGA GUAGAGGUUCAG GCGA UUG AUGAACG GCCUUCAGCGGCU CUGA Par sR33 GUGAUGA GUAGAGGUUCAA GCGA UGU GUGAGCG GCCUUCAGCGGCU CUGA Match Pca sR33 UUGAUGA GAAGUGGUUCAG GGGA GCG GUGAACG ACCUUCAGCGGCG CUGA Predicted site Pis sR33 GUGAUGA GUAGAGGUUCAG GCGA GAG AUGAGCG CCCUUCAGCGGCA CUGA of methylation Pog sR33 GUGAUGA GUAGAGGUUCAA GCGA AUGU GUGAGCG GCCUUCAGCGGCU CUGA Tne sR33 GUGAUGA GGAGAGGUUCAA GCGA GGC GUGAGCG CCCUUCAGCGGCA CUGA ||||| ||| ||||||||||| 3’ UCUCCCAGUC ----- 12 nts ---- CGGAAGUCGCCG 5’ 23S rRNA C*2045 23S rRNA G2021 D o w n A C lo B U A ad G A e d GG CC fro m C G h CC GG ttps C C ://a A G c a A A d e G A m U G ic G C .o u AC G G2021 p .c UG C om G C /n H68 GCG GCUG AA C UCU H69 ar/artic U AA GU G le-a CACC GG GG CCA C*2045 bstra G c 5’ U GG CC G CCUUU t/46/11/5 AA 3’ 67 8 /4 9 Figure3. Exampleofconservedinstanceofmismatchatthesiteofmodification(‘N+5’position)indicatesthatinsomecasesthetargetinteraction,rather 9 6 thanthemodification,isimportant.(A)SequencealignmentsofsRNAsinthesR33familyispresentedandtheirguidecomplementaritiesto23SrRNA 5 7 areindicatedbelow.Thecritical‘N+five’nucleotideintheguideregionsishighlightedinbluewhenthereisaWatson–Crickbasepairbetweentheguide 7 andtargetandinrosewhenthereisamismatchbasepair.MismatchedbasepairsarealsoindicatedbyanasteriskintherRNAposition.Theyellow by highlightrepresentsthe‘N+five’positionintherRNAtarget.(B)Thesecondarystructureofhelix68–69regionin23SrRNAisdepictedshowingthe g complementarityofthesR33DguidenearpositionG2021andtheD(cid:2)guidenearpositionC2045.AllsixsR33membershaveamismatchatposition2045 ue s atthe‘N+five’positionintheguide–targetinteraction.GreenbasesareribonucleotidesthatbasepairwiththeguidesofsR33.Thesecondarystructureof t o therRNAwasgeneratedbySSU-ALIGNpackage(56).SeeSupplementaryFigureS9foranotherexampleofaconservedmismatch. n 0 2 A p mtweomtbaergrsetas,nidftshoewyocouuldldbbeeeixdpeencttiefidedto. recognize the same Cmoemmpboesritfeamanidlietsr,afinsvpeohsaevdesRaNguAisd.ethOafttshhea1r8essRsoNmAerseinsegmle-- ril 20 1 blancetoaguideinadifferentsRNAfamily.Thesearedes- 9 ignatedaseithertransposedorcompositesRNAs(Table1). Transposed sRNAs share oneguide withanotherdefining Proliferation,mobility,plasticityandevolutionarydivergence family,buttypicallytheguidehasbeentransposedfromD ofC/DboxsRNAgeneswithinthePyrobaculumgenus toD(cid:2)orviceversa,fromD(cid:2)toDposition,comparedtothe defining family. Composite sRNAs have D and D(cid:2) guides Grouping the Pyrobaculum C/D box sRNAs into 110 ho- thateachmatchoneoftheguidesintwodifferentfamilies. mologousfamilieshasgreatlyfacilitatedourabilitytoob- Forexample,PcasR12/45hasaDguidethatissimilarto serve evolution of sequence features at a time-scale where theDguideofthesR12familyandaD(cid:2) guidethatissimi- variation is plentiful but homology can usually be estab- lartotheD(cid:2)guideofthesR45family(Figure4).Wesuggest lished, even for mobilized sRNA genes. Here, we describe thatthegenesencodingthesecompositeandtransposedsR- sequencesimilarityofguidesequences,whichimpartfunc- NAsaregeneratedbygenomicrearrangementsbetweendif- tion and are the most conserved, defining elements of ho- ferentsRNAsorsRNAgenes. mologousfamilies. NucleicAcidsResearch,2018,Vol.46,No.11 5685 D o w n lo a d e d fro m h ttp s ://a c a d e m ic .o u p .c o m Figure4. InterconnectedguidesequencesimilaritybetweendifferentfamiliesofsRNAs.Thecoloredsequences(green,magenta,blueandorange)indicate /n differentsequencesimilaritiesintheguideregionsofsRNAsoftheinterconnectedsRNAfamiliessR45,sR12/45,sR56andsR57.ThesRNAinthe a outgroupspeciesThermproteustenax(Tte;chromosomestart1584742)isrelatedtothesR57family. r/a rtic Table1. CompositeandtransposedC/DboxsRNAs le-a b s C/DboxsRNA Type D(cid:2)guide Dguide trac PcasR12/45 Composite SharedwithsR45familyD(cid:2)guide SharedwithsR12familyDguide t/46 PcasR103/109 Composite SharedwithsR103familyD(cid:2)guide SharedwithsR109familyDguide /11 PcasR26a Transposed SharedwithsR26familyDguide Notshared /5 6 PaesR57a Transposed Notshared SharedwithsR57familyDguide 7 8 PcasR57b Transposed SharedwithsR57familyDguide Notshared /4 9 ParandPogsR13a Transposed SharedwithsR13familyDguide Notshared 9 6 5 7 TwounusualtypesofsRNAs(compositeandtransposed)wereidentified.CompositesRNAshaveaDguidethatshowssequencesimilaritytoaguidein 7 onesRNAfamily,whereastheD(cid:2)guideshowssequencesimilaritytoaguidefromasecondsRNAfamily.Thesearegivenbothfamilynumbersseparated by byaforwardslash(/).TransposedsRNAshaveeitheraDguidethatissharedwiththeD(cid:2)guideofthedefiningfamily,orviceversa,aD(cid:2)guidethatis g u sharedwiththeDguideofanotherfamily.TransposedsRNAsareidentifiedwiththenumberofthedefiningfamilyfollowedbyalower-caseaorb.The e PaesR57aisconsideredasatransposedsRNAsincetheD(cid:2)guidenormallyassociatedwiththesR57isnotpresent(toviewthesesRNAsequences,see st o http://lowelab.ucsc.edu/snoRNAdb/) n 0 2 A p DgeunpeliccaatnionalosofsRocNcAurs.asDeuvpidliecnactieodnboyfathfuellh-liegnhglythssiRmNilaAr palniceaatriolynisdiennctliucdaelsthRe4s6Ra4g6enfaem,ainlyd,wPhaeerseRo6n2ly,wPohgerceoanntaainps- ril 2 0 1 PaesR113aand113b(SupplementaryFigureS10A),adu- parentpseudogenecanbefoundabout2.5Kbpaway(Sup- 9 plication not found in other species. The 5(cid:2)-flanking re- plementaryFigureS10B). gions in front of the two genes are unrelated but the 3(cid:2)- flankingregionshavesubstantialsimilarity.Downstreamof Superfamilies of sRNAs illustrate guide evolution. The the sR113a gene, sR08 is encoded on the opposite strand, sR45,12,56and57familiesappeartoshareacomplexlin- whereas the sR113b gene contains what appears to be the eage, based on analysis of their guide regions (Figure 4). remnantofthesRNAgenethathasbeenobliteratedbythe The sR56 and sR57 families represent a likely ancient du- PAE3005predictedopenreadingframe(ORF).ThesR113a plication appearing early within the Pyrobaculum lineage. andsR08genesareconvergentlytranscribedandseparated Both families have representatives in all six Pyrobaculum by a 1 bp intergenic space, a highly unusual arrangement species, and one family (sR57) is a circular permutation where transcription of one may interfere with the other. of the other (sR56). The closest out-group species for Py- There are small RNAseq reads for each of these sRNAs robaculum,Thermoproteustenax(Tte),containsonlyaho- (sR113a, sR113b, sR08). Other examples of apparent du- mologtosR56outofthesefoursRNAfamilies.TheDand D(cid:2) guides of sR56 target modifications to 16S U877 and 5686 NucleicAcidsResearch,2018,Vol.46,No.11 G908, respectively, and the D and D(cid:2) guides of sR57 tar- genomic instability, natural hotspots for insertion by mo- getmodificationsto16SG906andA879,respectively.The bileelements. sharedcoresequencebetweentheDguideofsR57andthe Five of the element copies were classified as C/D box D(cid:2)guideofsR56isUUCACCandthesharedcoresequence sRNAs(sR131,sR133,sR137,sR139,sR141)andcontain between the D guide of sR56 and the D(cid:2) guide of sR57 is moderately degenerate internal box sequences. These con- UCCUUUA.Thesecoressequencesareoffsetby2ntdue tainguidesthatareeitherorphanguidesorthattargetnon- toindelswithintherespectiveguidesandthisaccountsfor conservedtRNAtargets.Thegenomiclocationsofanother the2ntshiftintargetspecificities.Thetwoaberrant(trans- ten copies of this element, which were too diverged to be posed) members of the sR57 family (Pae sR57a and Pca considered as potential sRNA genes, are listed in Supple- sR57b) are circular permutations of each other and share mentary Table S6. A phenomenal 13.6% of the uniquely only a single guide (UC-CC-CUU, dashes indicate indels) mappedRNA-seqreadsinPcaweregeneratedfromasin- with the core sR57 family. Archaeal sRNAs are known to gleMITE-likelocus(sR141),whileallotherMITE-likeele- circularize(23,39–41),sowehypothesizethatthesR57fam- mentshavesmallRNAseqreadabundancessimilartoother D o ily may be an example of circularization and re-insertion sRNAs.Onaverage,anon-MITE-likePcasRNAhas0.5% w n intothegenome. of total unique reads mapped to it, with the highest per- lo a ThesR12familyisalsoimplicatedinthiscomplexinter- centageofuniquelymappedreadstoanon-MITE-likePca de cqounenneccetsioimniloafriftaymtoilieths.eIDt (cid:2)hgausiadeDo(cid:2)fgtuhiedesRth56atfaexmhiilbyit(sCsUe-- saRhNigAhebrepinegrc4e%nt.aTgeheofMaInTtiEse-lnikseerseRaNdsAcsoamlspoatreenddtotoohthaveer d from UC-CCUC).IndelsinthesR12D(cid:2)guidechangesthetarget sRNAs.Onaverage,39%ofreadsfromaMITE-likesRNA http specificitytoposition23SG1221.Asmentionedabove,the locusareantisense,whereasonaverage9.3%ofreadsfrom s D guide in the sR12 family is shared with the D guide of other Pca sRNAs are antisense. We suggest that this ele- ://a c the composite sR12/45. The second D(cid:2) guide of sR12/45 ment may play a role in the generation, mobilization and ad isderivedfromthesR45family;interestingly,thisguideis proliferationofC/DboxsRNAsortheirmodularcompo- em predictedtotargetmethylationtopositionC34intheanti- nents. In some eukaryotic species, such as mammals and ic.o codonloopoftRNAVal. platypuses, snoRNAs are known to duplicate similar to up The relationships between these four related families il- retrotransposons(13,14);however,toourknowledgethisis .co m lustrate several important aspects of sRNA gene evolu- thefirstinstanceoffindingthistypeofduplicationofC/D /n t(iroesnusltiinncglufdrionmg: i(ni)segretnioend/dueplleitciaotnio)no,r(iid)ivtaerrggeentcmeig(rreastuioltn- beloexmseRntNsAinsitnhearocthhaeera.fiWveePdyidronboatcudleutmectsptheecsieesM, aIlTthEo-uligkhe ar/artic ing from nucleotide substitution) that alters or abolishes theymayexistinlowercopynumbersandwithgreaterse- le guide–targetinteractions,and(iii)rearrangements,includ- quencedivergence. -ab ingguidereplacementand/orcircularpermutation. stra c AssociationofC/DboxsRNAswithothernon-codingRNAs. t/46 MITE-like elements resembling sRNAs. Miniature Most archaeal C/D box sRNAs are independently tran- /11 inverted-repeat transposon elements (MITEs) are Class scribed,butinafewcasesC/DboxsRNAgenesareknown /56 7 II transposons that occur in plants and other archaea tobepolycistronic(1).TranscriptionofarchaealC/Dbox 8/4 (42) and are characterized by a combination of terminal sRNAsgeneswithprotein-codinggeneshasbeenreported 99 6 inverted-repeatsandinternalsequencestooshorttoencode in Sulfolobus and Pyrococcus species (16,19); in Nanoar- 5 7 proteins. A careful analysis has also revealed the presence chaeum equitans, a few instances of di-cistronic C/D box 7 b ofaMITE-likeelementpresentinatleast15copieswithin sRNA–tRNAtranscriptshavealsobeenreported(43). y g the Pca genome (Figure 5 and Supplementary Table S6). AmongthedifferentPyrobaculumspecies,thetranscrip- ue s We found these elements after observing that many of the tional relationships between sRNA and other non-coding t o familieswithonlyonesRNAmemberoccurinPca(Figure RNA genes can be dynamic. We identified a novel, C/D n 0 1C) and taking a closer look at these Pca sRNAs. Pca box sRNA dicistron (sR101 and sR21) in Pae, confirmed 2 A exhibitsmodularduplicationsandrearrangementsbetween bysmallRNAseqreads(SupplementaryFigureS11),which p and within sRNA families as evidenced by the transposed appearstobesharedinPisandPcabasedongenomicprox- ril 2 0 andcompositesRNAsdescribedabove(seeTable1). imityandoverlappingRNA-seqreads(Figure6A).Wehy- 1 9 In Pca, these elements have characteristics of both sR- pothesize that sR100 may also be part of the polycistron NAs and MITEs. Each identified copy contains near- due to its proximity. An independent predicted promoter consensus C box and D box sequences, but highly degen- existsforsR100,butitispossiblethatitistranscribedfrom erate internal D(cid:2) and C(cid:2) sequences. Both guide sequences both promoters as there are instances in other prokary- (adjacenttoDandD’boxes)exhibitonlymodestsequence otes where small RNAs can be transcribed from multiple similarityacrossthe15copies.Highlyconservedimperfect upstreampromoters(44,45).Threespecies(Pne,Par,Pog) inverted-repeatsequencesflanktheCandDboxes(Figure have lost sR100, but maintain synteny between sR21 and 5).Theelementshavealargeaveragedistance(322nt)from sR101. In Pis and Pne the sR34 and sR40 genes are also thenearestprotein-codinggenecomparedtoothersRNAs co-transcribed(basedonRNA-seqreads)andseparatedby (22 nt). The presence of these MITE-like elements in re- 10,and−4nt,respectively.InPnetheDboxofsR34islo- gions of the genome where there are no other annotated catedwithintheCboxofsR40(4ntoverlap);itisunclear genomicfeaturesandthatdonotalignwithotherPyrobac- howthisoverlapaffectsthematurationofthetwosRNAs. ulum genomes suggests that they are located in regions of InPar,PogandPcathegenesareseparatedby16,16and NucleicAcidsResearch,2018,Vol.46,No.11 5687 D o w n lo a d e d iFlliugsutrrea5te.tMheIhTiEgh-lidkeegereleemoefnctoinnsethrveePdcsaeqguenenocmees.imThilearcihtyroinmtohseo5m(cid:2)eanodfP3(cid:2)cflaacnoknitnagininsv1e5rtceodprieepseoaftaseMquITenEc-elsik(eblsuReNhiAghelliegmhte)n.tT.hTehseRsNeqAu-elnikceesseaqreueanlicgense(dyetlo- fro m lowhighlight)containcanonicalCandDboxesbutgenerallydegenerateD(cid:2)andC(cid:2)boxes(boxed).TheconservationbetweentheDandD(cid:2)guidesequences h inthe15elementsismoderatewithaconsensussequenceatthebottom.FiveoftheseelementswerecategorizedasauthenticsRNAs.SupplementaryTable ttp S6containsthelocationsoftheMITE-likeelementsthatwerenotcategorizedassRNAs. s://a c a d (Figure6).Withinthepolycistronicexample,thesR100was e m lostfromthetranscriptionunitinthePar/Pog/Pnelineage ic andinPogandPartheremainingsR100andsR101genes .ou p developed individual promoters. Similarly, the sR44 gene .c appearstohavebecomelinkedtothetRNAMet geneinthe om ancestorofPae,Pis,Pne,PogandParlineage;Pcaisanout /na grouptothesespeciesanddoesnothavethesamesRNA– r/a tRNAlinkage(Figure6B).Separatepromotersforthetwo rtic genesoccurinthePog/Parsub-lineage. le-a b s tra c Figure6. GenomiccontextofsRNAgenes.(A)Genomicorganizationof ImpactofoverlappingofC/DboxsRNAgenesandprotein- t/4 thesR101(blue),sR21(red)andsR100(yellow)genesinthesixspeciesof codinggenes 6/1 Pyrobaculum.The16SrRNAphylogenetictreewithTteastheoutgroup,is 1 illustratedontheleft;thesRNAgenelocationsaboveabpdistancescaleis In a previous study (18), we noted that Pyrobaculum C/D /56 itlilvuesotrfattheedstRo1th0e0rgiegnhetifnorPtnhee,PPoygro,bPaacrualunmdisnpPecairesa.nTdhPeroegitshneoreriesparnes∼en2t0a0- boxsRNAsgenesareover40-foldmorelikelythantRNA 78/4 ntinsertionbetweenthesR101andsR21genes.(B)LinkageoftRNAand genestohaveconservedoverlapwithorthologousprotein- 99 sRNAgenes.InPae,PisandPnethesR44genes(greenarrows)arelo- codinggenes.Otherstudieshavealsonotedthe3(cid:2)-antisense 65 tPcoaotgaebdaon8udtnPt1as0r0otrhnletesdssaisnftrdaonamcpeptbheeaetrw3t(cid:2)eoeennbdetheoexftpaRreNtsRsAeNdMAeftrMoaemntdgsesenRpea4r(4abgtleaecnpkeroiasmrirnooctwreesra)s..seIIdnn oWveerlloaopkoedfCm/oDrebcolxosseRlyNaAttshwisitrhelpartoiotneisnh-icpobdeincagugseenoevse(r1l6a)p. 77 by g Pca,sR44islocated∼13KbpsdownstreamofthetRNAMetgene.There couldimpactthefunctionofbothgenetypes.Inaddition, ue isnorepresentativeofthesR44familyinTte. antisenseinteractionssuggestthepossibilitythatC/Dbox st o sRNAsmightguidemodificationofmRNAsorbeinvolved n 0 inantisenseregulation. 2 A 78nt,respectivelyandareconvergentlytranscribedwhereas In the set of 526 Pyrobaculum C/D box sRNAs, 97 ex- p inPaethetwogenesareseparatedbymorethan2000nt. hibiteitherpartialorcompleteoverlapwithprotein-coding ril 2 0 Plant species and the archaeon Nanoarchaeum equitans genes(Figure7;SupplementaryFigureS12andTableS7). 1 9 have C/D box sRNA genes that are reported to be co- For this analysis, we considered only overlaps that extend transcribedwithtRNAs(43).InthePyrobaculumgenus,we either into the D(cid:2) guide region (8 nt or more beyond the findasinglecaseofaC/DboxsRNAfamily(sR44)thatis 5(cid:2)-endofthesRNAgene)orintotheDguideregion(5nt likelyco-transcribedwithelongatortRNAMet,althoughthe ormorebeyondthe3(cid:2)-endofthesRNAgene)sinceshorter spacingbetweenthetwogenesisextremelyvariable,rang- overlapsendingintheDboxorCboxwerenotexpectedto ingfrom8nt(Pae,Pis,Pne)apartto100nt(Par,Pog)to13 impacttargetspecificity.Wenote,however,therearemany KbpinPca(Figure6B). instances(23total)wherethe5(cid:2)endofaslightlyoverlapping TheseexamplesdemonstratefluidityofC/DboxsRNA downstreamsRNAgene(samestrand)providesthetransla- co-transcription with other non-coding RNAs within the tionstopcodonfortheupstreamprotein-codinggene(e.g. Pyrobaculum genus and preference for individual promot- CboxRUGAUGA).Weclassifiedthemoreextensiveover- ers.NoneofthefoursRNAsdiscussed(sR21,sR100,sR101 lapsintosevencategories(Figure7;SupplementaryFigure andsR44)havehomologsintheout-groupspeciesTte,so S12andTableS7).Instancesofoverlapwiththe5(cid:2)-endofa these sRNAs probably arose in the Pyrobaculum lineage predictedORFwerecheckedmanuallytoconfirmthatthe
Description: