GBE Analyses of Charophyte Chloroplast Genomes Help CharacterizetheAncestralChloroplastGenomeofLandPlants Peter Civa´nˇ1, Peter G. Foster2, Martin T. Embley3, Ana Se´neca4,5, and Cymon J. Cox1,* 1CentrodeCieˆnciasdoMar,UniversidadedoAlgarve,Faro,Portugal 2DepartmentofLifeSciences,NaturalHistoryMuseum,London,UnitedKingdom 3InstituteforCellandMolecularBiosciences,UniversityofNewcastle,NewcastleuponTyne,UnitedKingdom 4DepartmentofBiology,FaculdadedeCieˆnciasdaUniversidadedoPorto,Porto,Portugal D 5DepartmentofBiology,NorgesTeknisk-NaturvitenskapeligeUniversitet,Trondheim,Norway o w n *Correspondingauthor:E-mail:[email protected]. lo a d Accepted:March 23, 2014 e d Datadeposition:Thechloroplastgenomesequences reportedinthisarticlehavebeendepositedinGenBankundertheaccessionsKlebsormidium fro m flaccidumKJ461680,MesotaeniumendlicherianumKJ461681,andRoyaanglicaKJ461682. h ttp s ://a c Abstract ad e m Despitethesignificanceoftherelationshipsbetweenembryophytesandtheircharophytealgalancestorsindecipheringtheoriginand ic .o evolutionarysuccessoflandplants,fewchloroplastgenomesofthecharophytealgaehavebeenreconstructedtodate.Here,we u p presentnewdataforthreechloroplastgenomesofthefreshwatercharophytesKlebsormidiumflaccidum(Klebsormidiophyceae), .co m Mesotaenium endlicherianum (Zygnematophyceae), and Roya anglica (Zygnematophyceae). The chloroplast genome of /g b Klebsormidiumhasaquadripartiteorganizationwithexceptionallylargeinvertedrepeat(IR)regionsand,uniquelyamongstrepto- e /a phytes,haslosttherrn5andrrn4.5genesfromtheribosomalRNA(rRNA)geneclusteroperon.ThechloroplastgenomeofRoyadiffers rtic fromotherzygnematophyceanchloroplasts,includingthenewlysequencedMesotaenium,byhavingaquadripartitestructurethatis le -a typicalofotherstreptophytes.OnthebasisoftheimprobabilityofthenovelgainofIRregions,weinferthatthequadripartitestructure b s haslikelybeenlostindependentlyinatleastthreezygnematophyceanlineages,althoughtheabsenceoftheusualrRNAoperonic tra c syntenyintheIRregionsofRoyamayindicatetheirdenovoorigin.Significantly,allzygnematophyceanchloroplastgenomeshave t/6 /4 undergonesubstantialgenomicrearrangement,whichmaybetheresultofancientretroelementactivityevidencedbythepresence /8 9 ofintegrase-likeandreversetranscriptase-likeelementsintheRoyachloroplastgenome.Ourresultscorroboratetheclosephyloge- 7 /5 neticrelationshipbetweenZygnematophyceaeandlandplantsandidentify89protein-codinggenesand22intronspresentinthe 4 3 5 chloroplastgenomeatthetimeoftheevolutionarytransitionofplantstoland,allofwhichcanbefoundinthechloroplastgenomesof 4 9 extantcharophytes. b y g Key words: charophytes, bryophytes, land plants, chloroplast genomics. u e s t o n 1 Introduction by plants and the fundamental role of plants in Earth’s eco- 1 A p Itisnowestablishedthatlandplantsevolvedfromfreshwater systems,thecharacterizationoftheancestorofembryophytes ril 2 greenalgalancestorsofthecharophytealgae(McCourt1995; has long been of special interest to evolutionary biologists. 01 9 Karol et al. 2001; Wodniok et al. 2011). The transition of Fromthecytological,physiological,andbiochemicalperspec- plants from an aquatic to the terrestrial environment is tive,itisevidentthatsomeofthefeaturestypicallyassociated thought to have occurred about 425–490Ma (Sanderson withlandplantshavetheirmolecularoriginsinthepreterres- 2003; Wellman et al. 2003; Gensel 2008; Rubinstein et al. trial era. Such features include multicellularity and three- 2010) and was followed by a rapid diversification of plant dimensional growth, cellulosic cell walls, phragmoplast lineagesthatresultedindramaticchangestotheEarth’sbio- formationduringcelldivision,orintercellularcommunication sphere(KenrickandCrane1997;Lentonetal. 2012).Given mediated by plasmodesmata, and plant hormones (Leliaert thegreatevolutionarysignificanceofthecolonizationofland etal.2012).Althoughthesefeaturesareindeedfundamental (cid:2)TheAuthor(s)2014.PublishedbyOxfordUniversityPressonbehalfoftheSocietyforMolecularBiologyandEvolution. ThisisanOpenAccessarticledistributedunderthetermsoftheCreativeCommonsAttributionNon-CommercialLicense(http://creativecommons.org/licenses/by-nc/3.0/),whichpermits non-commercialre-use,distribution,andreproductioninanymedium,providedtheoriginalworkisproperlycited.Forcommercialre-use,[email protected] GenomeBiol.Evol.6(4):897–911. doi:10.1093/gbe/evu061 AdvanceAccesspublicationMarch28,2014 897 GBE Civa´nˇ etal. to land plants, some of them involve genes that appear to et al. 2011). Although many of the genes necessary for have orthologs in charophyte algae (Timme and Delwiche chloroplast-specificfunctionshavebeentransferredtothenu- 2010; De Smet et al. 2011). A better understanding of the cleusandhavetheirproductsimportedintochloroplastsfrom evolutionofembryophytedesignisthereforedependentupon thecytoplasm,thegenesencodingtransmembranepolypep- animprovedunderstandingofstreptophyterelationshipsbut tides(subunitsofatp,ndh,pet,psa,andpsbcomplexes)tend iscurrentlyhinderedbythepaucityofcharophytenuclearand toberetainedbythechloroplastgenome(cpDNA),presum- organellargenomicdataavailableforstudy. ably because importing the protein products of these genes Phylogenetically, extant charophyte (Charophyta) line- would be difficult (Wicke et al. 2011). Other plastid genes ages form a paraphyletic assemblage with the land exhibit high expression levels at early developmental stages plants (Embryophyta) and are together classified as (e.g., genes for structural RNAs, ribosomal proteins, and the Streptophyta. However, elucidation of phylogenetic RNA polymerase), which likely favor their localization in the D o relationships among the charophyte groups, chloroplast rather than the nucleus (Wicke et al. 2011). w n namely, Chlorokybophyceae, Mesostigmatophyceae, Astablegenecontentofthechloroplastgenomeisaccompa- loa d Klebsormidiophyceae, Zygnematophyceae, Charophyceae, niedbyaconservedstructuralorganizationofitscircularmap e d and Coleochaetophyceae, with respect to the land plant whereby two inverted repeats (IRs) are separated by a large fro m clade, has been controversial. Early phylogenetic studies ap- single-copy(LSC)regionandasmallsingle-copy(SSC)region. h pearedtoprovideevidenceforanintuitivelyelegantprogres- Asthisquadripartitearchitecturelikelyconfersphysicalresis- ttp s sion of increasing morphological complexity from single-cell tancetorecombinationallosses(PalmerandThompson1981), ://a organismsoftheChlorokybophyceae,Mesostigmatophyceae, structuralchangestochloroplastgenomesareinfrequent,and ca d andKlebsormidiophyceae,throughthemulticellular,filamen- theiridentificationanddistributioncanbeusedtosupplement em tous,andthallosestructuredalgaeoftheZygnematophyceae sequence data in the evaluation of phylogenetic hypotheses ic .o (conjugatingalgae)andColeochaetophyceae,withthemost (Qiuetal.2006;Turmeletal.2006,2007;Jansenetal.2008; up complex, and most land plant-like, species of the Greweetal.2013).Althoughgenelossesareoftenhomoplas- .co m Charophyceae being most-closely related to the land plants tic(Martinetal.1998),otherrarergenomicchangessuchas /g b (Karoletal.2001;McCourtetal.2004).Thissametreetopol- largeinversions,insertion,anddeletionevents(indels),intron e /a ogywhereCharophyceaearethesistergrouptolandplants gain and loss, or gene order rearrangements may provide rtic wasagainobtainedinasix-genephylogeneticanalysisbyQiu reliablephylogeneticinformation(RokasandHolland2000). le -a etal.(2006).However,inthesamestudy,genematricesde- The gene complements of land plant chloroplasts do not b s rived from complete chloroplast genomes yielded a highly differ substantially from those of charophyte algae (Turmel tra c supported monophyletic Zygnematophyceae clade as the et al. 2006; Green 2011; Wicke et al. 2011). Moreover, t/6 /4 sister group to land plants. More recent analyses based on most introns found in embryophyte chloroplast genes are /8 9 chloroplast (Turmel et al. 2006, 2007) and nuclear phyloge- alsopresentincharophytechloroplastsandhadbeenacquired 7 /5 nomicdata(Wodnioketal.2011;Laurin-Lemayetal.2012; before the transition to land (Turmel et al. 2006). However, 43 5 Timmeetal.2012)placeZygnematophyceae,oracladeunit- althoughthechloroplastgeneorderamonglandplantgroups 4 9 ing Zygnematophyceae and Coleochaetophyceae, as closest isfairlystable(fig.2,Wickeetal.2011),dozensofsequence b y group to the land plants, whereas mitochondrial gene data inversions separate the known charophyte chloroplast ge- gu e sets remain inconclusive (Turmel et al. 2013). Currently, the nomes from one another and from the conserved gene s t o best-supportedhypothesisofcharophytebranchingorderhas order found in bryophytes (Turmel et al. 2005, 2006). n 1 a clade uniting Chlorokybus and Mesostigma at the base of Chloroplastgenomerearrangementsareespeciallyabundant 1 A the streptophyte tree with Klebsormidiophyceae, then in Zygnematophyceae, and it has been suggested that their p Charophyceae, the next two diverging lineages, respec- highoccurrenceiscausallyrelatedtothelossofquadripartite ril 2 0 tively, with the closest relatives of land plants, either the structure in this class (Turmel et al. 2005). However, a satis- 19 Zygnematophyceae alone or a clade consisting of both factory mechanistic explanation of such causality is lacking Zygnematophyceae and Coleochaetophyceae (Turmel et al. and a broader examination of the zygnematophycean 2006; Wodniok et al. 2011; Laurin-Lemay et al. 2012; cpDNAarchitecturehasyettobeconducted. Timmeetal.2012). Here, we report newly sequenced chloroplast genomes Photosyntheticorganelleshaveaclearfunctionalcontinuity of three charophyte algae, namely, Klebsormidium spanningthetransitionperiodbetweenaquaticalgalandter- flaccidum(Klebsormidiophyceae),Mesotaeniumendlicherian- restrialembryophyticlifestyles.Withatypicalgenomesizeof num (Zygnematophyceae), and Roya anglica between 115 and 170kb and a gene complement of (Zygnematophyceae). Klebsormidium flaccidum is a species 100–120 unique genes (Green 2011; Wicke et al. 2011), from the last taxonomic class of charophyte algae to lack a thestreptophyteplastidgenerepertoireisrelativelystablebe- completely sequenced chloroplast genome. The two zygne- cause retention of the core set of chloroplast genes is likely matophyceantaxaarebothsaccodermdesmidsoftheprevi- understrongselection,andgenegainsareexceptional(Wicke ously unsampled family Mesotaeniaceae and thought to be 898 GenomeBiol.Evol.6(4):897–911. doi:10.1093/gbe/evu061 AdvanceAccesspublicationMarch28,2014 GBE CharophyteChloroplastGenomes earlydivergingortransitionalformsofconjugatingalgae.The samples not reported here. The library type for Illumina se- three genomes aid our understanding of the structural quencingwas91pairedend,withapproximately500bpfrag- changes that occurred in chloroplasts during the evolution mentsize. of early-diverging streptophyte clades. Moreover, compari- sonsof thegeneticcompositionofchloroplastgenomesan- DataProcessingandAssembly cestraltoembryophyteswiththoseoftheZygnematophyceae Roche454pyrosequencingandIlluminashort-readdatawere reveal several uniquely shared features that corroborate the imported into Geneious 5.6.3 (Biomatters, http://www.gen closephylogeneticrelationshipoftheseplantgroups. eious.com, last accessed April 8, 2014) in sff and fastq for- mats,respectively.Aftertheremovalofoligonucleotideadap- Materials and Methods ters, sequences were trimmed from both sides, discarding D regions with >4% (sff) or >5% (fastq) chance of an error ow AlgalCulturesandChloroplastGenomeSequencing n per base. As the data were from a whole-genome shotgun lo a CulturesofK.flaccidum([Ku¨tzing]P.C.Silva,K.R.Mattox,and collectionofsequencesbutonlythechloroplastfractionwas de d W.H.Blackwell,1972)andM.endlicherianum(Na¨geli,1849) ofinterest,theassemblyofthechloroplastgenomeswasun- fro wereobtainedfromtheSAGCultureCollectionofAlgae(ac- dertakeninthreestages.1)Foreachtaxon,areferencewas m cessionnumbersSAG121.80andSAG12.97,respectively)and chosenfromthesetofknownchloroplastgenomesofstrep- http R. anglica (G.S. West in W.J. Hodgetts, 1920) (accession tophyticalgae.Fromeachreferencegenome,protein-coding s://a number ACOI799)from AlgotecadeCoimbra(hereafterwe geneswereextractedandusedastemplatesformappingof c a refer to the samples as “Klebsormidium,” “Mesotaenium,” thesequencereadsinGeneious.Thisreference-guidedrecon- de m and “Roya,” for brevity). Klebsormidium and Mesotaenium structiontypicallyyieldedasetofshort(0.1–1kb),high-con- ic cells were inoculated on Petri dishes with 1.5(cid:2) Bold’s basal fidence chloroplast contigs representing <10% of the .ou p medium (Andersen et al. 2005) supplemented with agar genome. 2) The full paired-read data sets were used for fo- .c o (1.5%,w/v)andcultivatedfor10–14daysinagrowthcham- cused assembly by PRICE (Paired-Read Iterative Contig m/g berunder14h:10hlight:darkregime(100–120mmols(cid:3)1m(cid:3)2 Extension,version0.18;Rubyetal.2013),utilizingtheshort be irradiation). Roya was grown in a liquid mixture of LC chloroplast contigs as initial seeds. In PRICE assemblies, the /artic (Algoteca de Coimbra, Portugal) and Bold’s basal medium minimaloverlapwassetto30,andtheminimalpercentiden- le (1:1)underthesamelightconditionsasabove.Afterapprox- tityto95and85forIlluminaand454datasets,respectively. -ab s imately 1 month, the culture of Roya was passed through a For454sequencereads,the-spfargumentwasusedtocreate tra c 20–25mm filter paper, the cells collected on the filter were falsepaired-enddatafile.Variabletrimmingandfilteringop- t/6 rinsed with sterile 0.5(cid:2) medium, and used for DNA tions were applied. 3) Resulting contigs usually representing /4/8 extraction. the whole reconstructed genome were imported back to 97 /5 Approximately 1g of cells was harvested for each taxon. Geneious, where the sequence reads were remapped onto 4 3 The samples were briefly deep frozen in liquid nitrogen and the contigs. This third stage enabled the sequence coverage 54 9 usedforDNAextractionwithoutanyfurthermechanicalcell and base-assignment confidence to be evaluated, and the b y breaking. The frozen cells were resuspended in 5–10ml of identification and adjustment of ambiguous sites and re- g u e 2ex%traCcTtiAonB[bwu/fvf]e;r0(.03.%1M2-MTrEis;;02.01mmgMmNl(cid:3)a12ERDNTaAse; A1.;4pMH~N8aC.5l;) peaStpeedciraelgaiottnesn.tionwaspaidtothereconstructionof the IR st on and incubated for 1h at 65(cid:4)C with occasional vortexing. regions of the chloroplast genomes. In a standard PRICE as- 11 A Subsequently, the tubes were chilled on ice, the DNA was sembly of a quadripartite-structure chloroplast genome, one p extracted with equal volume of chloroform:isoamylalcohol of the following problems may occur: two IRs are collapsed ril 2 0 (24:1) and precipitated with isopropanol for 1h at (cid:3)20(cid:4)C. intoasinglecontig;extensionofthesecondIRstopsdueto 19 The precipitate was collected and rinsed with wash buffer readsmappingtoanexistingcontig;andIRsandsingle-copy (70% ethanol; 0.12M sodium acetate) and 70% ethanol. regions are joined incorrectly. To overcome these issues, a The pellet was dissolved in TE overnight, and the DNA was simplestrategywasapplied.AfteranIRwasidentifiedinpre- purified with High Pure PCR Product Purification Kit (Roche) liminaryrunsof2)–3)assemblysteps,a“dead”IRcontigwas according to the manufacturer’s instructions. Quality of the preparedandaddedtotheinitialseedsforanother2)–3)as- DNAwascheckedonanagarosegel,andDNAquantityand semblyrun.The“dead”IRconsistedofanIRregionextended purity weredeterminedby nanodrop.Mesotaeniumwasse- for approximately 500 cytosines on both ends, which effec- quenced on ½ picotiter plate with GS FLX Titanium (IGSP tivelyexcludesthiscontig,aswellasalltheIR-mappingreads, Genome Sequencing & Analysis Core Resource, Duke from the PRICE assembly process. The remaining seeds are University), whereas Klebsormidium and Roya were se- extended until the completion of SC regions, which contain quenced on a single lane of Illumina HiSeq2000 (BGI Tech short overlaps with the IRs, enabling the four regions to be SolutionsCo.Ltd,HongKong,China)alongwithfourother joinedcorrectlyintoacircle. GenomeBiol.Evol.6(4):897–911. doi:10.1093/gbe/evu061 AdvanceAccesspublicationMarch28,2014 899 GBE Civa´nˇ etal. AnnotationandAnalysesoftheChloroplast PhylogeneticAnalyses GenomeContent Phylogeneticanalysesof83protein-codingchloroplastgenes ThesoftwareDOGMA(Wymanetal.2004)wasusedforini- fromthenewlyassembledgenomesofRoya,Mesotaenium, tialgeneannotations.Thereafter,athoroughexaminationof and Klebsormidium, plus 23 streptophytes and four chloro- protein-codinggenecontentwasperformedasfollows.Open phyte outgroup taxa, were conducted. Maximum likelihood readingframes(ORFs)intheassembledgenomeswereiden- and Bayesian Markov chain Monte Carlo (MCMC) analyses tifiedbygetorf(partoftheEMBOSSsuite:minimallength30 were conducted using among-site (PhyloBayes CAT model; nucleotides, translations from start to stop codon retrieved), Lartillot and Philippe 2004) and among-lineage (P4 NDCH andBLASTp(Altschuletal.1990)wasusedtodetectsimilar- model; Foster 2004) composition models to determine the ities with a National Center for Biotechnology Information best-fitting models and the best-supported trees. Details of D (NCBI)ReferenceSequence(refseq)libraryofallknownchlo- these analyses are presented elsewhere (Civa´nˇ et al. unpub- o w roplastproteins(downloadedinOctober2012).Afterthean- lished); the best-fitting PhyloBayes CAT-model using the nlo notationofknownproteins,wefurtherexaminedlongerORFs gcpREVexchangeratemodel(CoxandFoster2013)analysis ad e ftrhoemsecroengsiopnicsuohuadslyu“neremppotryte”drehgoimonoslotgos.deTtoerfmacinilietawtehtehtehseer osufpapmoirntoedachidypisopthreesseisnotefdrehlearteioansshairpesfebraesnecdeotrneethfeosrethdeabtae.st- d from analyses, we built a custom BLAST database of plastid ORFs Analyses of chloroplast genome structural features were h (determinedbygetorf:minimallength100nucleotides;trans- basedon66parsimonyinformativecharacters:Thepresence ttps lations from stop to stop codon retrieved) from all available orabsenceof30monocistronicgenesand19groupIIintrons, ://ac Viridiplantae(chlorophytesandstreptophytes)chloroplastge- plus the gene complement and gene order in 17 operons. ad e nomes(downloadedfromNCBIGenBankOctober2012).This tRNA genes and their introns were not considered, except mic libraryconsistedof1.17millionORFsandwasusedinBLASTp for those tRNA genes located within polycistronic units. .ou analysestoidentifysequenceswithsimilarity(Evalue<1e-4) Introns were scored as Dollo characters with “absence” as- p.c o totheORFsidentifiedfromthe“empty”regionsofthenewly sumed to be the ancestral condition. Dollo character coding m assembledgenomes.Intronswereidentifiedbycomparisonto correspondstoamodelinwhicheachderivedstateisallowed /gb e gpehnyteesa.liEgxnomne–ninttsroonfbootrhdeerrsalwgaeereainndferrreepdrewseitnhtatthiveeabidryoof- ttaokoesrigtinhaetefoornmly oonfceredvuerrsinaglsetvoolutthioen,aanncdestarlallhocomnodpitlaiosny /article protein alignments and intron border consensus sequences (SwoffordandBegle1993).Theancestralstateof28mono- -ab (Sugita and Sugiura 1996). To determine the frequency of cistronicprotein-codinggeneswasassumedtobe“presence” stra withthecharacterstreatedasirreversible,thereforeallowing c short repeats, one IR was removed from the quadripartite t/6 genomes, and direct and inverted repeats >20bp were multiple losses but no secondary gain of genes. The “pres- /4 searched with a 1e-03 threshold using REPuter (Kurtz et al. ence”or“absence”of77additionalgeneswithin17operons /897 2001),andanaveragenumberofrepeatsperkbwascalcu- was also evaluated (Sugita and Sugiura 1996; Wicke et al. /54 lated.Thenewlyconstructedgenomeswerevisualizedusing 2013—informationregardingtheoperonicorganizationisde- 354 rivedfrommodelangiospermsbutwasadaptedforthegene 9 circulargenomemapscreatedbyOGDraw(Lohseetal.2007). b Gene contents of the newly reconstructed chloroplast setobservedhere).Operonswerecodedasmultistatecharac- y g genomeswerecomparedwiththegenomesofotherstrepto- ters defined by step matrices, with unspecified ancestral ues phytealgaeanda“hypotheticallandplantancestor”(HLPA). states.Inthestepmatrices,everychangeinoperonorganiza- t on ThegenecontentoftheHLPAunitwasinferredfromaselec- tion was of equal distance except irreversibility of genes lost 11 fromthegenome(i.e.,genelossfromanoperonequalsdis- A tion of taxa representing all major lineages of land plants p (the same taxon set as used in the phylogenetic analyses tance1;genegaininanoperonfromanothercpDNAlocation ril 2 equalsdistance1;andgenegaininanoperonfromoutsideof 0 below), assuming monophyly of land plants and only verti- 1 9 cpDNAequalsinfinity).Structuralcharactersweresubjectedto caltransferofgenes.Genomerearrangementsbetweenchar- parsimonyanalysisinPAUP4.0(Swofford2003),withoptimal ophytes and two land plants, namely, Pellia endiviifolia trees obtained using the branch-and-bound algorithm. (a liverwort; NC_019628), and Isoetes flaccida (a lycophyte; Bootstrapanalyseswith1,000replicateswereperformedheu- NC_014675),weredeterminedusingmultiplegenomerear- ristically with default parameters. (A NEXUS formatted char- rangements(MGR:BourqueandPevzner2002)usinganalysis actermatrixusedforthestructuraldataanalysesisincludedin thatignoredthetransferRNA(tRNA)genesandoneoftheIR thesupplementarymaterial,SupplementaryMaterialonline.) in quadripartite genomes. Because the choice between the two IR copies is relevant for the gene order analyses, both Results alternative“single-IR”geneorderswereconsideredforquad- ChloroplastGenomeAssembly ripartitegenomes,andthearrangementleadingtothemost parsimoniousresultwaschosenforpairwisegenomecompar- ForeachofKlebsormidium,Mesotaenium,andRoya,assem- isonsinMGR. bly of the short-read data yielded a single large contiguous 900 GenomeBiol.Evol.6(4):897–911. doi:10.1093/gbe/evu061 AdvanceAccesspublicationMarch28,2014 GBE CharophyteChloroplastGenomes Table1 Summary Statisticsof theGenomeAssembly Data Platform Numberof MeanRead Proportionof Lengthof Coverage((cid:2)) ReadsObtained Length(After ReadsMapping thecp (Total) Trimming)(bp) tothecp Genome(bp) Mean Min Max Standard Genome(%) Deviation Klebsormidium IlluminaHiSeq2000 46,124,918 86.5 0.66 176,832 152.7 6 235 23.6 flaccidum Mesotaenium Roche454 689,398 357.1 23.2 142,017 378.9 85 589 72.2 endlicherianum Royaanglica IlluminaHiSeq2000 54,070,476 86.2 0.78 138,275 273.3 1a 518 105.1 D o aThe1(cid:2)coveragewas10-bplongandlocatedwithinanAT-richintergenicregion. w n lo a d e sBeeqcuaeunseceoffohrigwhhiscehquitenwcaesrepaodssicbolevetroagcelso,se15in3t,o37a9c,iracnled. stwtreepetnopthheyteIRs a(anvdersainggele3-c7o%py) rbeugtiodnsiff(e4r6s.0s%ubsatanndtia3l6ly.5b%e-, d from 273 mean reads per site for Klebsormidium, Mesotaenium, respectively). Mean intergenic spacer length was 358bp http and Roya, respectively, no gaps or ambiguous regions were (52,071bp in total), with two conspicuous exceptions s present (supplementary fig. S1, Supplementary Material (6,340and4,231bp).Thesetwoextendedintergenicregions ://ac a online). Summary statistics of the data and the genome containthreeunidentifiedORFs(6,063,1,785,and1,425bp), d e m assembliesarepresentedintable1. whichhadnostrongmatches(Evalue<1e-4)amongBLASTp ic searches of the refseq database or the custom ORF .ou p library. Group II introns were found in seven genes (table 2) .c o Klebsormidiumflaccidum andaccount for 3.7% of thetotal genome length. By com- m /g The chloroplast genome of Klebsormidium was assembled parisontothegenomeofChara(Charophyceae)whichhasa be into a circular map of 176,832bp (fig. 1A; NCBI GenBank larger overall size, the proportion of intergenic spacers and /artic accession number KJ461680); the third largest among cur- introns is several times lower, indicating that the le rently sequenced streptophyte chloroplast genomes, smaller large genome size of Klebsormidium can be attributed -ab s only than Pelargonium (Geraniaceae, Spermatophyta) and mainlytolargeIRregions. tra c Chara (Charales, Charophyceae). The genome has a quadri- t/6 partite organization, which differs from the typical embryo- /4/8 9 phytic architecture by having exceptionally large IRs 7 Mesotaeniumendlicherianum /5 (51,118bp each), a greatly reduced SSC region (1,817bp), 4 3 andarelativelyshorterLSCregion(72,779bp).Theexpanded ThechloroplastgenomeofMesotaeniumwasassembledasa 54 9 IR regions contain both small (rrn16) and large (rrn23) ribo- circular sequence comprising 142,017 bp (fig. 1B, NCBI b y somalRNA(rRNA)genes,seventRNAgenestypicallyfoundin GenBankaccessionnumberKJ461681)andlacksaquadripar- g u streptophyteIRs,plus23additionalprotein-codinggenestyp- tite structure, as do the two previously published es icallylocatedinsingle-copyregions(fig.2).Mostremarkably, Zygnematophyceae chloroplast genomes (namely Zygnema t on the rrn5 gene (5S rRNA) and the region homologous to the and Staurastrum; Turmel et al. 2005). The Mesotaenium 11 A rrn4.5 gene in embryophytes (4.5S rRNA—in nonembryo- genome contains 88 protein-coding, 4 rRNA, and 34 tRNA p phyte streptophytes, the rrn4.5 gene-coding region forms genes,andalthoughitis23and15kbshorterthanZygnema ril 2 0 an integral part of the 30-end of the 23S ribosomal subunit) andStaurastrum,respectively,itdoesnotcontainfewergenes 19 are absent from the genome (supplementary fig. S2, (fig. 3). Intergenic spacers occupy almost one-third of the Supplementary Material online). The SSC region contains genomelength(46,765bp),withameanintergenicdistance only asinglegene(ccsA), whereas59protein-codinggenes, of357bp.GroupIIintronswerefoundin12genes,withclpP and 21 tRNA genes, reside in the LSC region. Six ribosomal and ycf3 having two introns each (table 2), and the group I protein genes (rpl14, rpl16, rpl23, rps3, rps15, and rps16) intron typically found in the streptophyte trnL-UAA gene is usuallypresentinstreptophytechloroplastgenomesaremiss- present. With an average size of 669bp, introns of ing, as are several other protein-coding genes (fig. 3). Two Mesotaenium are similar in length to those of bryophytes genesintheKlebsormidiumgenome,rps12andpsbA,require (713bp)ratherthanthelongerintronsintheothertwozyg- transsplicing for correct protein translation. In total, genes nematophycean chloroplast genomes (966bp). The overall codingfortworRNA,28tRNA,and82proteinswereidenti- genome GC content (42%) is notably higher than in the fiedintheKlebsormidiumchloroplastgenome.TheGCcon- other chloroplast genomes of Zygnematophyceae (32%) or tentofthegenomeisrelativelyhigh(42%)comparedamong landplants(37%). GenomeBiol.Evol.6(4):897–911. doi:10.1093/gbe/evu061 AdvanceAccesspublicationMarch28,2014 901 GBE Civa´nˇ etal. D o w n lo a d e d fro m h ttp s ://a c a d e m ic .o u p .c o m /g b e /a rtic le -a b s tra c t/6 /4 /8 9 7 /5 4 3 5 4 9 b y g u e s t o n 1 1 A p ril 2 0 1 9 FIG.1.—ChloroplastgenomemapsofKlebsormidiumflaccidum(A),Mesotaeniumendlicherianum(B),andRoyaanglica(C). 902 GenomeBiol.Evol.6(4):897–911. doi:10.1093/gbe/evu061 AdvanceAccesspublicationMarch28,2014 GBE CharophyteChloroplastGenomes D o w n lo a d e d fro m h ttp s ://a c a d e m ic .o u p .c o m /g b e /a rtic le -a FIG.2.—IRregionsofKlebsormidiumandRoya,incomparisontoChaetosphaeridiumandChara(charophytes),andPellia(abryophyte). bs tra c t/6 /4 /8 9 7 /5 4 3 5 4 9 b y g u e s t o n 1 1 A p ril 2 0 1 9 FIG.3.—ChloroplastgenecontentamongcharophytesandaninferredHLPA.AllrRNAandprotein-codinggenesfoundwithinthesamplesetofthe phylogeneticanalysesareincluded.Genepresenceandabsenceareindicatedbyblueandorangeshading,respectively.Novelabsencesofgeneswith respecttoothercharophytegenomesarehighlightedinred.(Notethatthedisambiguationofycf2/ftsHhasbeennewlyinterpreted,seesupplementary tableS1,SupplementaryMaterialonline.) GenomeBiol.Evol.6(4):897–911. doi:10.1093/gbe/evu061 AdvanceAccesspublicationMarch28,2014 903 GBE Civa´nˇ etal. 1i66fcy – + – + – + + + Royaanglica 2i3fcy – + – + – + + + ThechloroplastgenomeofRoyawasreconstructedasacircu- larsequenceof138,275bpinlength(fig.1C,NCBIGenBank 1i3fcy – + + + + + – + accession number KJ461682), making it the shortest of the four zygnematophycean chloroplast genomes sequenced so 1i)CAU(Vnrt – – – – – + + + far (including Mesotaenium here). (One 10-bp region in the Roya genome hadonly 1Xcoverage: However, as thissmall 1i)UUU(Knrt + – + – + + + + stretchwasinanAT-richintergenicspacerandsurroundedby well-supportedpairedreads,wedidnotverifytheregionvia 1i)UAG(Inrt + – – – – + + + Sanger sequencing.) Unlike other zygnematophycean, the D Roya genome has a quadripartite architecture. The genome o w 1i)CCU(Gnrt – + + + + + + + sequence consists of SSC and LSC regions (20,213bp and nlo a 92,926bp, respectively) and a pair of IRs (12,568bp each). d e 1i)CGU(Anrt + – + – + + + + TheIRsofRoyabearsomeresemblancetoatypicalchloroplast d fro IR in terms of gene content—all genes of the rRNA operon m 1i61spr – – – + – + + + (rrn16–trnI-GAU–trnA-UGC–rrn23–rrn4.5–rrn5)arepresent— http although,theintegrityofthisoperonhasbeendisruptedand s 2i21spr – + – + – – – + the genes are merely neighboring units with jumbled order ://ac a and orientation (fig. 2 and supplementary fig. S3, d e 1i21spr + + + + + + + + m SupplementaryMaterialonline).Atleasttworearrangements ic wouldbenecessarytorestorethestandardorderoftherRNA .ou 1i1Copr – + – + – – + + p operon. The IRs of Roya also contain three additional tRNA .c o genes(trnR-ACG,trnP-GGG,andtrnL-UAG)andtwolonger m 1i61lpr – + – + + + + + /g ORFs (orf268 and orf230). The protein translation of orf268 b e 1i2lpr – + – + – + + + (g8e0n7ebfpro)mhasthheigchhlsoimroiplalraistyt(gEevnaolumee:5oef-a14c)htlooraonphIRy-cloecaanteadlgianet /article 1iAbsp + – – – – – – – Oedogonium cardiacum (Brouard et al. 2008). Because int -abs encodesaproteinbelongingtothefamilyoftyrosinerecom- tra c 1iDtep – + + + – – + + binases(Brouardetal. 2008),theproductoforf268wasla- t/6 beledasaputativerecombinase/integraseprotein.Inaddition, /4 /8 1iBtep + + + – – + + + orf268 has high similarity to ORF (46,439–46,717) in the 97/5 Anthoceros (a hornwort) chloroplast genome (E value: 1e- 4 3 1iDhdn + – – – – – – – 13)althoughthelatterisnotlocatedwithintheIRandissig- 54 9 1iBhdn – + – + – + + + merically. nbfryiafimcdaeenstcolyerfns2th3o0wrtie(t6hr,9s3ourbgf2pg6)e8sst.hinoTgwhteshahsteigcithomnsdiamyuilnanoriditteybnettoifiheocdmhloorerlooagpdolianusgst by gues A 1iAhdn – + – – – + + + dnu ORFs present in two ferns of the Ophioglossaceae, namely t on ndHLP 3iPplc + – – – – – – – elabele M3ea-1n8kyaunadch5eej-u0e6n,srisesapnedctOivpehlyio).glossumcalifornicum(Evalues: 11 Ap phytesa 2iPplc – + – – – – – + egenear conTthaeinsi2n8glteR-cNoApygerengeiso,n8s7ofprtohteeinR-ocyoadicnhglogroepnleass,tagnednotmwoe ril 2019 Charo 1iPplc – + – – – + + + hesam aredpdoitritoendalfoOrRoFtshweritZhyghnigehmsaimtoiplahriytcieesateo.Thhyepofitrhsettoicfatlhpersoeteainds- Distributionamong 1i1iAFmpetac +– +– +– –– +– –– m–– +– intronsoccurringint dSZolotyrictPfCiu3ouis1CnC0ap(pS,l00th6p3Oa6u9RsCFospsoifm,0f5Zioly4aSrg)rtfian2touye4frm5taSo,sattarahuu(Earmpasuvsttasar(liuEutgimevne:ivfi(a2rEcleeuavv-nee0a:tr9lus)es1e,i:emta-r4n1aileadn2-rs)1ictt6yhrai)ep.nttdRoaseTsecslloooa(nccRnuudTdss,) Table2Intron(GroupII) Klebsormidium Mesotaenium Roya Zygnema Staurastrum Chara Chaetosphaeridiu HLPA N.—MultipleOTE icogtnooertftnne2asgil6c)er,8qaswsupieneiastnhcRteaolyarryseamtonhmecoeacatunypnprisoyneurstgmee3grn0agec%lelseytnirfocoeofftudrnoiRtshdetTael-enliinmkcgeeeecnnhootolformafr23coe91tpi70vl(ai4tbsy1apt.n,:7gTdA0he4neisnobiimtnmp-tliieelkaisrner-, 904 GenomeBiol.Evol.6(4):897–911. doi:10.1093/gbe/evu061 AdvanceAccesspublicationMarch28,2014 GBE CharophyteChloroplastGenomes D o w n lo a d e d fro m h ttp s ://a c a d e m ic .o u p .c o m /g b e /a rtic le FIG.4.—Phylogeneticanalyses.(A)BayesianMCMCphylogeneticanalysesof83protein-codingchloroplastgenes:PhyloBayesCAT+gcpREV+(cid:2), -ab marginal likelihood: (cid:3)Lh¼244,645.3855. (B) Strict consensus tree of six most parsimonious trees (length 239, consistency index¼0.243, retention stra index¼0.786)resultingfromanalysisofthestructuraldata(geneandintroncontent,operonstructure).Numbersatnodesareposteriorprobabilities c t/6 andnonparametricbootstrapvaluesfor(A)and(B),respectively.ThenodesrepresentingtheHLPAarehighlighted. /4 /8 9 7 genedensitytotheeconomicallypackedchloroplastgenome structural data (gene and intron content, and operon struc- /54 3 of Mesotaenium. However, the overall GC content of the ture) identified six optimal trees (tree length239steps, con- 5 4 9 Roya genome (33%) more closely resembles the base com- sistency index 0.243, retention index 0.786): The strict b y position in Zygnema (31%) and Staurastrum (33%) than consensustreeispresentedinfigure4B.Nonparametricpar- g u Mesotaenium(42%). simony bootstrap analysis of the structural data poorly sup- e s ports a monophyletic Zygnematophyceae (54% bootstrap t o n PhylogeneticAnalyses proportion[BP])withstrongsupportforthesister-grouprela- 1 1 tionshipbetweenRoyaandStaurastrum(97%BP).Thestrep- A In figure 4A, a Bayesian MCMC analysis of the best- p fitting model (PhyloBayes CAT+gcpREV+(cid:2)4; (cid:3)L ¼244, tophytes as a whole are well supported (98% BP), with ril 2 h 0 645.3855)ofaminoaciddataofthe83proteinsispresented. MesostigmaandChlorokybusformingtheearliest-branching 19 lineage. The remaining streptophytes form a well supported The tree shows strong support (>0.95 posterior probability) (100% BP) clade within which Klebsormidium is the first fortheparaphylyofcharophytes,withKlebsormidiumbranch- diverging lineage (83%). Relationships among Chara, ing early in the phylogenetic grade before Chara and Chaetosphaeridium, Zygnematophyceae, and the land plant Chaetosphaeridium, and with Zygnematophyceae as the clade(itself87%BP)areunsupported(ornegligiblysupported sister group to land plants. Within the Zygnematophyceae, <70%),butthetopologyisneverthelesscongruentwiththat all relationships are strongly supported, with Mesotaenium oftheproteintree. forming the earliest-branching lineage, and Zygnema sister to a clade formed by Roya and Staurastrum. This finding is ComparisonsbetweenCharophyteandLandPlant in conflict with the traditional placing of Roya in the family ChloroplastGenomes Mesotaeniaceaebutisinagreementwithotherphylogenetic reconstructionsofconjugatingalgae(Gontcharovetal.2003; Theprotein-codinggenecomplementsoftheKlebsormidium, GontcharovandMelkonian2010).Parsimonyanalysisofthe Roya,andMesotaeniumchloroplastgenomesaresummarized GenomeBiol.Evol.6(4):897–911. doi:10.1093/gbe/evu061 AdvanceAccesspublicationMarch28,2014 905 GBE Civa´nˇ etal. infigure3.ThechloroplastgenomeofKlebsormidium,witha Chaetosphaeridium, Staurastrum, Mesostigma, and repertoire of only 82 unique protein-coding genes, has the Chlorokybus (all <1 repeat/kb). A greater number (1.68 re- lowestprotein-codinggenecontentofanycharophyteplastid peats/kb) were recorded in Mesotaenium; however, the genomereportedtodate.Hence,theKlebsormidiumchloro- amount is still fewer than in Chara and Zygnema (3.16 and plast genesetismoredissimilar tothe estimatedcontent of 25.73repeats/kb,respectively). theHLPA(22presence/absencedifferences)thanarethege- nomes of Mesostigma or Chlorokybus (18 and 16 presence/ Discussion absence differences, respectively). In contrast, the taxa with gene complements most closely resembling the HLPA are NewInsightsintotheChloroplastGenomicsof Roya and Chaetosphaeridium with eight presence/absence Charophytes differences each: the other three Zygnematophyceae each D haveoneadditionaldifference.Comparisonsofthepresence The absence of the 5S rRNA gene from the Klebsormidium ow n and absence of group II chloroplast introns show that chloroplastgenome,andtheregionhomologoustothe4.5S loa rRNAgeneofembryophytes,fromthe30-endofthe23SrRNA d Chaetosphaeridium is the most similar to the HLPA with 17 e d introns at congruent positions (table 2). However, gene,isthefirstreportofanincompletesetofrRNAgenesin fro eitherchloroplastormitochondrialgenomes;evenwithinthe m Mcoemsomtaoennipuomsitiisontsheanndexat-lsmooshtassimthilearclwpPit-hint1ro6n-in2trothnastaist greatly reduced chloroplast genomes of parasitic plants, the http cvioomusmlyobneiennlafnodunpdlainntcchhalororopphlaytsetgaelgnaoem.eWshbeunththaesnoopterpornes- rcRoNmAploepmeeronnt orefmrRaNinAsinsutabcutn(iKtrgauensees20is08as).suBmeceadusveittahletoustuhael s://aca (polycistronic units) of charophyte chloroplast genomes are assembly and function of ribosomes, it seems likely that the dem compared with those of land plants, the operonic comple- 4.5Shomologousregionand5SrRNAgeneshavebeentrans- ic mentsofCharaandChaetosphaeridiumshowgreatersimilar- located to the nuclear genome of Klebsormidium and that .oup itytotheHLPA(12and13identicaloperons,respectively)than their products are imported into the chloroplast stroma. .co Nevertheless, multiple losses among eukaryotes of the 5S m do Zygnematophyceae (11 or fewer identical operons). The /g operonicorganizationofRoyaisthenext-mostsimilartothe geneinthemitochondrion(AdamsandPalmer2003)suggest be HLPA (11 concordant operons), whereas the other three thatcompletelossoftheseribosomalsubunitscannotbeen- /artic Zygnematophyceaebearasfewoperonsofearlylandplants tirelyruledout.IftheassumptionofrRNAtranslocationfrom le as do more distantly related streptophyte algae, such as the nucleus is correct, chloroplast-directed rRNA import ren- -ab s Klebsormidium.Thislackofmaintenanceofoperonicintegrity ders plastid protein synthesis in Klebsormidium ultimately tra c amongZygnematophyceae(exceptingRoya)isconsistentwith dependent on the nucleus and raises questions concerning t/6 the high number of implied genome rearrangements identi- the mechanisms of inter-compartmental RNA trafficking. /4/8 fiedbyMGRanalysis(supplementaryfig.S4,Supplementary Additionally,ifthe4.5SrRNAisbeingimportedintothechlo- 97 /5 Material online). The syntenic structure of the Staurastrum roplast,thenitisalsoactingasaseparate4.5SrRNAspeciesas 43 chloroplast genome implies 20 and 23 rearrangements to in the embryophytes and is not an integral part of the 23S 54 9 matchthegeneorderinPellia(liverwort)andIsoetes(lycopod), rRNAasisimpliedbyitsannotationinnonembryophytestrep- b y respectively.Royaappearstohavetheleastnumberofrear- tophytechloroplastgenomes.ThetransportofnuclearmRNA gu e rangements among the known zygnematophycean chloro- intothechloroplastisknowntooccur(Nicola¨ıetal.2007),and s plast genomes with a minimum of 18 and 21 changes indirectevidencesuggeststhattRNAsareimportedfromout- t on 1 implied by comparison to the Pellia and Isoetes genomes, side the plastid in some parasitic plants (Bungard 2004). 1 A respectively. However, Roya and Staurastrum are also highly However, to date, the import of rRNA into the chloroplast p rearrangedwithrespecttoeachother,with18impliedrear- has not been demonstrated. Although, the mechanism(s) of ril 2 0 rangements.Anevengreaternumberofrearrangementssep- chloroplast-directedRNAimportremainuncharacterized,two 19 aratethechloroplastgenomeofPelliafromMesotaeniumand candidate pathways are currently considered plausible. First, Zygnema(atleast25and32changes,respectively).Bycom- theimportofrRNAintothechloroplastcouldbefacilitatedby parison,thegeneorderofChaetosphaeridiumismoresimilar aproteinprecursorutilizingtheproteinimportpathway,asis tolandplantsthanthoseofothercharophytesandrequiresas the case of tRNA transport into mitochondria (Schneider few as 10 changes to match the operonic organization of 2011). Alternatively, short noncoding RNA sequences may PelliaandIsoetes.Althoughtheabundanceofshortsequence be responsible for chloroplast localization of nuclear tran- repeatshaspreviouslybeenimplicatedasapossiblemediator scripts(Go´mezandPalla´s2010).Ineithercase,thechloroplast of genome arrangements, numbersof short repeats are not genomeofKlebsormidiumisunusualinlackingthe5SrRNA exceptionally high in the two zygnematophycean genomes gene, 4.5S-homologous region, and six ribosomal protein reported here. In Klebsormidium and Roya, short se- genes typically present in streptophyte chloroplast genomes quencerepeatswererelativelyrare(0.24and0.38repeats/kb, anddisplaysauniquedependencyonthenucleusforchloro- respectively) and similar to the numbers found in plastproteinsynthesis. 906 GenomeBiol.Evol.6(4):897–911. doi:10.1093/gbe/evu061 AdvanceAccesspublicationMarch28,2014
Description: