GBE Dynamic Evolution of the Chloroplast Genome in the Green Algal Classes Pedinophyceae and Trebouxiophyceae Monique Turmel*, Christian Otis, and Claude Lemieux De´partementdeBiochimie,deMicrobiologieetdeBio-Informatique,InstitutdeBiologieInte´grativeetdesSyste`mes,Universite´ Laval, Que´bec,Que´bec,Canada *Correspondingauthor:E-mail:[email protected]. Accepted:June 28, 2015 D o Datadeposition:TheprojecthasbeendepositedatGenBankundertheaccessionsKM462860–KM462888. w n lo a d e d Abstract fro m h Previous studies of trebouxiophycean chloroplast genomes revealed little information regarding the evolutionary dynamics of this ttp genomebecausetaxonsamplingwastoosparseandtherelationshipsbetweenthesampledtaxawereunknown.Werecentlysequenced s://a thechloroplastgenomesof27trebouxiophyceanand2pedinophyceangreenalgaetoresolvetherelationshipsamongthemainlineages c a recognizedfortheTrebouxiophyceae.ThesetaxaandthepreviouslysampledmembersofthePedinophyceaeandTrebouxiophyceaeare de m includedinthecomparativechloroplastgenomeanalysiswereporthere.The38genomesexamineddisplayconsiderablevariabilityatall ic levels,exceptgenecontent.OurresultshighlightthehighpropensityoftherDNA-containinglargeinvertedrepeat(IR)tovaryinsize,gene .ou p contentandgeneorderaswellastherepeatedlossesitexperiencedduringtrebouxiophyceanevolution.OfthesevenpredictedIRlosses, .c o oneeventdemarcatesasupercladeof11taxarepresenting5late-diverginglineages.IRexpansions/contractionsaccountnotonlyfor m /g changesingenecontentinthisregionbutalsoforchangesingeneorderandgeneduplications.Inversionsalsoledtogenerearrange- b e rmeaernrtasnwgeitdhincotmhepIaRr,eidncwluidthintghtehireIrRe-vceornstaalionrindgisrhuopmtioonloogfstahnedrDteNnAdotopesrhoonwinasnomacecelinleeraagteeds.rMatoesotfofsethqeue2n0cIeR-elevsoslugteionno.mInesthaereIRm-loerses /article superclade,severalancestraloperonsweredisrupted,afewgeneswerefragmented,andasubgroupoftaxafeaturesaG+C-biased -ab s nucleotidecomposition.Ouranalysesalsounveiledputativecasesofgeneacquisitionsthroughhorizontaltransfer. tra c Key words: Trebouxiophyceae, Pedinophyceae, plastid genomics, genome rearrangements, inverted repeat, horizontal t/7 /7 transfer, repeats, introns. /2 0 6 2 /6 3 2 Introduction 1 Howthe chloroplastgenomeis changingthroughtime is 7 0 Chloroplastsaresemiautonomousorganellesthatpossesstheir bestunderstoodforlandplants,abranchoftheViridiplantae by own genome; with the assistance of chloroplast-targeted (green algae and land plants) that emerged about 450 Ma gu e productsencodedinthenucleus,theycarryoutthereactions (JansenandRuhlman2012;WolfandKarol2012).Studiesof st o necessary for the capture of energy from the sun as well as a large number of land plant chloroplast DNAs (cpDNAs) n 3 otherfunctions(GrayandArchibald2012).Thechloroplastsof (mostlyfromseedplants)haveuncoveredthehighlyconser- 0 M thephotosyntheticeukaryotesbelongingtotheArchaeplastida vative nature of this organelle genome. The vast majority of a rc orPlantaesensulato(redalgae,glaucophytes,andviridiplants) seedplant cpDNAs is 107–218kb insizeand their 101–118 h 2 originate from a primary endosymbiosis event involving a genes, which are interrupted by 21 introns, are dispersed 01 9 cyanobacterium and a nonphotosynthetic eukaryote (Palmer among the IR and the large and small single-copy (LSC and 2003; Rodriguez-Ezpeleta et al. 2005; Gray and Archibald SSC)regionswithanearlyidenticalgenepartitioningpattern 2012). Although the number of retained cyanobacterial (JansenandRuhlman2012).TheIRhasbeenlostoccasionally genesvariesaccordingtothelineage,thechloroplastgenomes during land plant evolution; at least five independent losses ofArchaeplastidasharemanycyanobacterial-likeoperonsand, havebeendocumentedinseedplants(JansenandRuhlman exceptforthoseofredalgae,generallycontaintwocopiesofa 2012).Althoughchloroplastgeneorderhasbeenmaintained large inverted repeat (IR) encoding the rRNA operon (Green over long evolutionary periods, extensive gene rearrange- 2011). ments have occurred in some angiosperm lineages (Jansen (cid:2)TheAuthor(s)2015.PublishedbyOxfordUniversityPressonbehalfoftheSocietyforMolecularBiologyandEvolution. ThisisanOpenAccessarticledistributedunderthetermsoftheCreativeCommonsAttributionLicense(http://creativecommons.org/licenses/by/4.0/),whichpermitsunrestrictedreuse, distribution,andreproductioninanymedium,providedtheoriginalworkisproperlycited. 2062 GenomeBiol.Evol.7(7):2062–2082. doi:10.1093/gbe/evv130 AdvanceAccesspublicationJuly1,2015 GBE ChloroplastGenomeEvolutionintheTrebouxiophyceae and Ruhlman 2012). The most common events underlying morphology and ecology (Lewis and McCourt 2004; Friedl changesinlandplantcpDNAarchitectureincludealterations and Rybalka 2012; Leliaert et al. 2012). It includes several ingeneorderthroughsequenceinversions(reversals)andthe species participating in symbiosis with fungi to form lichens, contraction/expansionoftheIR. photosyntheticsymbiontsinciliates,metazoanandplants,as TheViridiplantaealsocomprisethegreenalgae,whichare wellasspeciesthathavelostphotosyntheticcapacity.Toiden- divided between the Streptophyta and Chlorophyta. The tify the interrelationships between the major clades of tre- streptophytealgaeorcharophytesaretheclosestrelativesof bouxiophyceans and gain information on the evolutionary landplantsandtheirchloroplastgenomesharesmanysimilar- history of the chlorophyte chloroplast genome, we recently ities with their land plant counterparts (Turmel et al. 2007). sequencedthechloroplastgenomesof27trebouxiophyceans Compared with their streptophyte homologs, chlorophyte andtwopedinophyceans,thusbringingto3and35thetotal D chloroplast genomes exhibit a much greater diversity in numberofphotosynthetictaxaanalyzedfortheirchloroplast o w genomeandgeneorganization.Todate,fewerthan30chlor- genome in the Pedinophyceae and Trebouxiophyceae nlo ophytechloroplastgenomeshavebeendescribedintheliter- (Lemieux et al. 2014a). Phylogenetic analyses of 79 cpDNA- ad e asttaunred:aTrdhegyernaensg(eLafnrogman6d4Ntoed5e2l5cukb20in12si)z,ea,nedncaodneum88b–e1r2o8f ethneco3d8edpperdoitneoinpshayncedagnesneasndfrotmreb6o1ucxhioloprhoypcheyatness,,inrecvluedailnedg d from those containing a large IR display large deviations from the that the Trebouxiophyceae is not monophyletic. Two major http arengcieosntsral(ipnatptaerrtnicouflagr,encehlpoarortpithioynceinagnaamnodngulvthoephsyincgelaen-cogpey- ccllaaddeesocfon2t9aincinogretrterebboouuxixoiopphhycyecaenantsaxtahawteirsesiidsteenrtifitoed:thAe s://ac a nomes),whichisobservedinstreptophytes,someprasinophy- Ulvophyceae and Chlorophyceae, and a clade comprising de ceans, and the pedinophycean Pedinomonas minor (Maul the Chlorellales and Pedinophyceae that is sister to the mic et al. 2002; Pombert et al. 2005, 2006; de Cambiaire et al. core trebouxiophyceans+Ulvophyceae+Chlorophyceae (see .ou 2006;Robbensetal.2007;Brouardetal.2008;SmithandLee fig. 1). Like most of the chlorellaleans, early-diverging core p.c o 2009; Turmel, Gagnon, et al. 2009; Smith et al. 2010; trebouxiophyceans are predominantly planktonic species, m Lemieux et al. 2014b). Additional genomic changes experi- whereas core trebouxiophyceans occupying later-diverging /gb e eonfcethdebyIR,cheloxrtoepnshiyvtee gchelnoeroprelaasrtragnegneommeenstsi,nceluxpdaentshioenloossf lineInagtehsisarsetumdyo,stwlyetererrpeosrttriatlhoersatreurocttuerrarelsfteriaatluarlegsaeo.f the 29 /article gene and intergenic sequences, invasion by repeat elements newlysequencedchloroplastgenomesthatwereusedtore- -ab andintrons,acquisitionofforeigngenesbyhorizontaltrans- constructtheabovementionedphylogeniesandincludeinour stra fer,changesinnucleotidecomposition,andgenefragmenta- c comparative genome analysis the previously sampled mem- t/7 tion (Maul et al. 2002; Be´langer et al. 2006; Pombert et al. bersofthePedinophyceaeandTrebouxiophyceae(Wakasugi /7 /2 2005,2006;deCambiaireetal.2006,2007;Robbensetal. etal.1997;deCambiaireetal.2007;Turmel,Otis,etal.2009; 06 2007; Turmel et al. 2008; Smith and Lee 2009; Turmel, Smithetal. 2011; Servin-GarciduenasandMartinez-Romero 2/6 Gagnon, et al. 2009; Turmel, Otis, et al. 2009; Brouard 2012).Wesoughttoidentifythemaingenomicchangesthat 321 et al. 2008, 2011, 2010; Smith et al. 2010, 2011; Lemieux 7 occurred in the various lineages investigated. The examined 0 etal.2014b). genomes display considerable variability at all levels except by g Consideringthatthechloroplastgenomeexperiencedtre- gene content. Our results highlight the high propensity of ue mendousalterationswithintheChlorophytaandthatonlya the rDNA-containing IR to vary in size, gene content and st o fewtaxahavebeeninvestigatedineachofthemajorlineages n gene order, and the repeated losses it experienced during 3 ofthisdivision,itisstillunclearwhatweretheancestralcon- trebouxiophycean evolution. Overall, the structural genomic 0 M ditionsoftheselineagesandwhetherdistinctlineagesdiffer a data provide independent support for many of the relation- rc in their evolutionary patterns. Of course, knowledge of the h shipsweidentifiedinourpreviousphylogenomicstudy. 2 branchingorderamongandwithinthemainchlorophytelin- 01 9 eages is required to infer what genomic changes accompa- nied the emergence of new lineages. In this regard, the Materials and Methods phylogeny of chlorophytes is in constant flux (Leliaert et al. SourceandAnnotationsofChloroplastGenomes 2012; Marin 2012; Fucikova et al. 2014; Lemieux et al. 2014a),andatthistime,itisthoughtthatthefirstbranches The pedinophycean and trebouxiophycean chloroplast ge- oftheChlorophytaareoccupiedbyprasinophyceanlineages, nomes compared in this study are those that were used to withprasinophyceancladeVIIbeingsistertoallcorechloro- constructthephylogeniesrecentlyreportedbyLemieuxetal. phytes (Pedinophyceae+Chlorodendrophyceae+Chlorel- (2014a).GenBankaccessionnumbersforall38genomesare lales+Trebouxiophyceae+Ulvophyceae+Chlorophyceae). provided in table 1. All genomes are available as complete The Trebouxiophyceae is a species-rich class of the genome sequences, except those of the core trebouxiophy- Chlorophyta that exhibits numerous lineages as evidenced ceans Oocystis solitaria (1 contig), Pleurastrosarcina brevispi- by 18S rDNA analyses and displays remarkable variation in nosa (1 contig), and Trebouxia aggregata (41 contigs). The GenomeBiol.Evol.7(7):2062–2082. doi:10.1093/gbe/evv130 AdvanceAccesspublicationJuly1,2015 2063 GBE Turmeletal. D o w n lo a d e d fro m h ttp s ://a c a d e m ic .o u p .c o m /g b FIG.1.—Phylogeneticrelationshipsamongthe38corechlorophytesexaminedinthisstudyandtotallengthsofcoding,intronic,intergenic,andsmall e/a repeatedsequences(>30bp)intheirchloroplastgenomes.ThepresenceofalargeIRencodingrRNAgenesisalsoindicated.Thebest-scoringMLtreethat rtic Lemieuxetal.(2014a)inferredfrom79cpDNA-encodedproteinsundertheGTR+(cid:2)4modelispresented.Notethatintron-encodedgeneswerenot le -a consideredascodingsequencesbutratherasintronsequencesandthattheO.solitaria,P.brevispinosa,andT.aggregatagenomesequencesarenot b s complete. tra c t/7 /7 presence of abundant repeats in the T. aggregata genome isthetotalnumberofstandardgenesinthegenomeandnSB /20 preventedusfromassemblingthecompletesequence. isthenumberofsidedblocks,that is, thenumberofblocks 6 2 Themethodsthatwereusedtogenerateandannotatethe includingadjacentgenesonthesamestrand. /63 2 29 chloroplast genomes are described in Lemieux et al. Alignmentsofwholegenomesfromtaxabelongingtose- 1 7 0 (2014a). The same methods were employed to reannotate lected clades were carried out using the ProgressiveMauve b y previously described genomes to produce very high quality algorithmofMauve2.3.1(Darlingetal.2010)afterremoval g u annotations. Coding sequences of nonstandard chloroplast of one copy of the IR from the IR-containing genomes. The es geneswereidentifiedasfollows:Free-standingopen-reading numbers of reversals separating all genome pairs in these t o n frames(ORFs)ofmorethan100codonswereobtainedusing clades were estimated with MGR 2.03 (Bourque and 30 GETORF in EMBOSS 6.6.0 (Rice et al. 2000) and their trans- Pevzner 2002) using the permutation matrix file generated M a lated products were subjected to BLASTP similarity searches byMauve,whichrecordstheorderandorientationoflocally rch againstthenonredundantdatabaseattheNationalCenterfor collinearblocks. 20 1 Biotechnology Information (NCBI) (http://blast.ncbi.nlm.nih. The ancestral genomic reconstruction option of MLGO 9 gov/Blast.cgi, last accessed July 14, 2015). Only the ORFs (Maximum Likelihood for Gene-Order Analysis) (Hu et al. withsimilaritiestogenesofknownfunctionwereannotated. 2014) was employed to predict the order of the 91 genes Intron types and boundaries were determined by modeling shared by all compared genomes at each internode of the intron secondary structures (Michel et al. 1989; Michel and amino acid-based phylogeny previously inferred by Lemieux Westhof 1990) and by comparing intron-containing genes etal.(2014a).Thegeneordermatrixweanalyzedtookgene with intronless homologs. Circular and linear genome maps polarity into account and contained only one copy of the IR weredrawnwithOGDraw(Lohseetal.2007). sequenceandofotherduplicatedgenelociwithintheIRorSC region.Thenumbersofreversalsseparatingtheinternaland AnalysesofGeneOrganization terminalnodesofthegenomerearrangementtreewerecom- Thesidednessindex(C)wasdeterminedasdescribedbyCui puted using GRIMM 2.01 (Tesler 2002). For comparison of s etal.(2006)usingtheformulaC =(n(cid:2)n )/(n(cid:2)1),wheren branch lengths, the genome rearrangement tree was scaled s SB 2064 GenomeBiol.Evol.7(7):2062–2082. doi:10.1093/gbe/evv130 AdvanceAccesspublicationJuly1,2015 GBE ChloroplastGenomeEvolutionintheTrebouxiophyceae Table1 GenBankAccessionNumbersandMain Featuresof theChloroplast GenomesExamined inThisStudy Taxon AccessionNo.a A+T Size(bp) Genes(no.)b Intronsc Repeats (%) Genome IR LSC SSC GI GII (%)d Marsupiomonassp.NIES1824 KM462870* 59.7 94,262 9,926 68,185 6,225 106 0.3 Pedinomonastuberculata KM462867* 66.6 126,694 16,074 86,619 7,927 107 5 5 2.1 Pedinomonasminor NC_016733 65.2 98,340 10,639 70,398 6,664 106 0 Diclosteracuatus KM462885* 70.0 169,201 22,061 87,535 37,544 112 6 5.4 Parachlorellakessleri NC_012978 70.0 123,994 10,913 88,297 13,871 112 1 4.0 Pseudochloriswilhelmii KM462886* 63.3 109,775 12,798 66,211 17,968 113 1 4.2 D o Marvaniageminata KM462888* 61.8 108,470 113 1 3.0 w n Chlorellavulgaris NC_001865 68.4 150,613 113 3 7.3 lo a Chlorellavariabilis NC_015359 65.9 124,579 113 3 2.4 d e d KGoemlieilnlaelclaortceornritciocala KKMM446622887841** 7627..03 111877,,584433 1158,,879816 17379,,334167 180,,491554 110059 81 1 1212..67 from Geminellaminor KM462883* 72.1 129,187 11,970 95,317 9,930 108 1 1 3.2 h Gloeotilopsissterilis KM462877* 70.5 132,626 13,730 95,069 10,097 109 2 1 5.1 ttps Oocystissolitaria FJ968739e 71.0 >96,287 >378f 71,295 110 1 1 0.7 ://a c Planctonemalauterbornii KM462880* 66.8 114,128 10,577 81,906 11,068 111 1 7.3 a d Pleurastrosarcinabrevispinosa KM462875*e 65.5 >295,314 45,468 >194,027g 10,351 111 16 3 21.3 em Neocystisbrevis KM462873* 68.6 211,747 112 5 19.8 ic .o Stichococcusbacillaris KM462864* 68.1 116,952 8,272 51,357 49,051 107 4 1 4.3 u p Prasiolopsissp.SAG84.81 KM462862* 64.9 306,152 108 7 1 23.1 .c o m “Chlorella” mirabilis KM462865* 68.5 167,972 6,835 121,087 33,215 110 5.5 /g Koliellalongiseta KM462868* 68.6 197,094 10,619 141,677 34,179 111 4.0 b e Pabiasigniensis KM462866* 66.6 236,463 27,336 141,652 40,139 111 20.0 /a Parietochlorispseudoalveolaris KM462869* 68.4 145,947 6,786 115,976 16,399 109 6.8 rtic le Leptospiraterrestris NC_009681 72.7 195,081 107 4 4.8 -a b Xylochlorisirregularis KM462872* 60.3 181,542 28,473 76,371 48,225 110 15 7.1 s tra Microthamnionkuetzingianum KM462876* 65.3 158,609 107 16.7 c Fusochlorisperforata KM462882* 64.9 148,459 107 3.5 t/7 /7 Trebouxiaaggregata EU123962–EU124002e 65.2 >245,724 100 8 42.7 /2 0 Myrmeciaisraelensis KM462861* 69.6 146,596 112 3.6 6 2 Lobosphaeraincisa KM462871* 72.2 156,031 111 1 1.4 /6 3 Dictyochloropsisreticulata KM462860* 64.1 289,394 111 3 5 19.7 21 7 Watanabeareniformis KM462863* 58.8 201,425 110 6 1 23.0 0 b Choricystisminor KM462878* 54.6 94,206 111 0 y g Botryococcusbraunii KM462884* 57.6 172,826 112 1 2 9.8 u e Elliptochlorisbilobata KM462887* 54.2 134,677 110 3 15.1 st o Trebouxiophyceaesp.MX-AZ01 NC_018569 42.3 149,707 114 4 0.9 n 3 Coccomyxasubellipsoidea NC_015084 49.2 175,731 114 1 10.6 0 M Paradoxiamultiseta KM462879* 49.4 183,394 114 14 18.6 a rc aTheasterisksdenotethe29genomessequencedbyLemieuxetal.(2014a)anddescribedhereforthefirsttime. h 2 bIntronicgenesandfreestandingORFsnotusuallyfoundingreenplantchloroplastgenomesarenotincludedinthesevalues.Duplicatedgeneswerecountedonlyonce. 0 1 cNumberofgroupI(GI)andgroupII(GII)intronsisgiven. 9 dNonoverlappingrepeatelementsweremappedoneachgenomewithRepeatMaskerusingtherepeats(cid:3)30bpidentifiedwithREPuterasinputsequences. eBecausetheOocystissolitaria,Pleurastrosarcinabrevispinosa,andTrebouxiaaggregatachloroplastgenomesarepartiallysequenced,thevaluesreportedfortheirsizes representunderestimatesandthosecorrespondingtoothergenomicfeaturesmaybeinaccurate. fTheexactsizesoftheO.solitariaIRandSSCregionscouldnotbedeterminedbecausetheIR/SSCjunctionhasnotbeenidentified. gThesizeoftheP.brevispinosaLSCregionwasunderestimatedbecausethisregioncontainsasequencinggap. usingKtreedist(Soria-Carrascoetal. 2007)sothatitsglobal of duplicated genes. Confidence of branch points was esti- divergence was as similar as possible to that of the protein matedby1,000bootstrapreplications. tree. We used a custom-built program to identify the regions A maximum-likelihood (ML) tree was inferred using the thatdisplaythesamegeneorderinthecomparedchloroplast phylogeny reconstruction option of MLGO and a matrix of genomes. Gene order in each genome was converted to all gene order containing all standard genes, including copies possible pairs of signed genes and the presence/absence of GenomeBiol.Evol.7(7):2062–2082. doi:10.1093/gbe/evv130 AdvanceAccesspublicationJuly1,2015 2065 GBE Turmeletal. thegenepairssharedbytwoormoregenomeswascodedas Total chloroplast genome size in our study group ranges binary characters in Mesquite 3.01 (Maddison WP and from 94,206 (in Choricystis minor) to 306,152 pb (in MaddisonDR2015).Gains/lossesofgenepairsthatoccurred Prasiolopsissp.SAG84.81)andvariesmarkedlywithinsome duringtheevolutionofpedinophyceanandtrebouxiophycean individualtrebouxiophyceanlineages(table1andfig.1).For taxa were identified by tracing these characters with example,inthePrasiolaclade,thegenomeoftheminutealga MacClade 4.08 (Maddison DR and Maddison WP 2000) Stichococcusbacillarisis2.6-foldsmallerthanthatofitsclosest under the Dollo principle of parsimony on thetree topology relative, Prasiolopsis sp. SAG 84.81. Most of the genomes inferredbyLemieuxetal.(2014a). smaller than 150kb are found in the Pedinophyceae, Chlorellales, and the Geminella+Oocystis clade, whereas AnalysesofRepetitiveSequences thoselargerthan190kbarerestrictedtocoretrebouxiophy- D cean lineages that diverged after the Geminella+Oocystis o Toestimatetheproportionofrepeatedsequencesinindividual w chloroplastgenomes,repeats(cid:3)30bpwereretrievedusingthe clade. nlo Only18ofthe38comparedchloroplastgenomespossessa ad REPFINDprogramofREPuter2.74(Kurtzetal.2001)withthe e olepntgiothns=30-fbp)(f-oarlwlmaardx)and-pthe(npamlinasdkreodmoicn) th-el ge(mnoinmimeusme- lfiagrg.e1)I.R,TapxaartlaocfkwinhgicshucehncaondeIRsathreefroRuNnAdginentehse(tCahblloere1llaanleds d from (three of the six algae sampled from this clade) and in core h quence using REPEATMASKER (http://www.repeatmasker. ttp trebouxiophycean lineages that diverged after the s org/, last accessed July 14, 2015) running under the cross_- Geminella+Oocystis clade (three of the six algae examined ://a match search engine (http://www.phrap.org/, last accessed ca inthePrasiolaclade,threeofthefourrepresentativesofthe d July 14, 2015). The repeats identified by BLASTN 2.2.30+ e Microthamniales+Xylochloris clade, and all members from m searches of each chloroplast genome against itself (word ic the superclade containing the Trebouxiales and the .o size=30) were defined into distinct elements using RECON u 1.08 (Bao and Eddy 2002) and these elements were then Lobosphaera, Watanabea, Choricystis, and Elliptochoris p.co clades). The IR shows important fluctuation in size both m classified in different groups of size intervals. The G+C con- amongandwithinlineages(fig.2).Thesmallest(6.8kb)and /gb tentsoftherepeatedanduniquesequenceswithineachchlo- e largest(45.5kb)IRsarefoundincoretrebouxiophyceansrep- /a roplast genome were calculated from the outputs of resenting independent lineages: Parietochloris pseudoalveo- rtic REPEATMASKER that were generated with the -xsmall le laris and P. brevispinosa, respectively. Among the lineages -a option (under this option the repeat regions are returned b in lower case and nonrepetitive regions in capitals in the represented by multiple taxa, the Prasiola clade displays the stra most important IR size variation (4-fold). Note here that a c maskedfile). t/7 member of this clade, S. bacillaris, exhibits two copies of a /7 /2 8,272-bpsequencethatareinvertedrelativetooneanother 0 G+CContentofProtein-CodingGenes 6 andseparatedbysimilarsizedsingle-copyregions;however, 2/6 TheG+Ccontentofprotein-codinggeneswasdeterminedat thisIRlackstherRNAgenes(table1andsupplementaryfig. 32 1 each codon position using DAMBE (Xia 2013) and the con- S1,SupplementaryMaterialonline). 7 0 catenated nucleotide data set (79 genes, 15,468 codons) of Like the IR size, the proportion of noncoding sequences by Lemieuxetal.(2014a). (i.e., introns and intergenic regions) in the examined gu e cpDNAs is highly variable both among and within lineages st o Results (fig. 1). The intergenic regions, which represent up to 68% n 3 A+TContent,GenomeSize,andPresence/AbsenceofIR ofthegenome(inPrasiolopsis),arethenoncodingsequences 0 M contributingthemosttotheobservedgenomesizevariation. a rc Themapsofthe29newlysequencedchloroplastgenomesare Thelargestgenomes(>200kb)generallycontainnotonlythe h 2 shown in supplementary figure S1, Supplementary Material highest amount of intergenic regions but also the greatest 01 9 online,alongwiththosepreviouslyreportedfortheirhomo- abundance of repeats of more than 30bp (table 1, fig. 1, logs in the Pedinophyceae and Trebouxiophyceae. The main and supplementary fig. S2, Supplementary Material online). structuralfeaturesofthesegenomesaresummarizedintable With42.7%ofsmallrepeats,thegenomeofthelichensym- 1. All 38comparedchloroplast genomes, except three from biotic T. aggregata, whose 41 contigs total 245.7kb, is the the Trebouxiophyceae (T. aggregata, P. brevispinosa, and O. most repeat-rich genome identified in our study. In general, solitaria),havebeencompletelysequenced.Asexpected,most the genomes with the greatest proportions of repeated se- ofthesegenomesarerichinA+T(table1).Onlysix,allfrom quences contain the largest numbers of distinct repeat ele- core trebouxiophyceans belonging to the ments (supplementary fig. S2, Supplementary Material Elliptochloris+Choricystis clade, have an A+T content of less online).Inanygivengenome,smallrepeatsaregenerallyhet- than 58.0% and among these, the most G+C-biased erogeneousinsize,composedofdirectaswellaspalindromic genome, with 42.3% A+T, is that of Trebouxiophyceae sp. sequences,andricherinG+Ccontentthanuniquesequences MX-AZ01. (supplementary fig. S2, Supplementary Material online). The 2066 GenomeBiol.Evol.7(7):2062–2082. doi:10.1093/gbe/evv130 AdvanceAccesspublicationJuly1,2015 GBE ChloroplastGenomeEvolutionintheTrebouxiophyceae D o w n lo a d e d fro m h ttp s ://a c a d e m ic .o u p .c o m /g b e /a rtic le -a b s tra c t/7 /7 /2 0 6 2 /6 3 2 1 7 0 b y g u e s t o n 3 FIG.2.—GeneorganizationofthelargeIRsinthechloroplastgenomesexaminedinthisstudy.CodingsequencesoftherRNAgenesarerepresentedin 0 M redand,foralltheIRsfeaturinganancestralrDNAoperon,thedirectionoftranscriptionofthisoperonisshownbyanarrow.TheO.solitariaIRisnot a rc representedbecauseitsextentremainsunknown.Allgenemapsaredrawntoscale. h 2 0 1 9 proportions of distinct repeats assigned to six categories of identified tRNA genes, even though four of these genes size intervals (30–39, 40–59, 60–89, 90–149, 150–249, and ((trnK(cuu), trnL(aag), trnP(ggg), and trnR(ccu)) occur rarely >250bp)revealthatthedistributionofrepeatsizesisvariable in chlorophyte cpDNAs (fig. 3). All genomes share a set of among and within lineages (supplementary fig. S2, 91genescodingforthreerRNAs(rrs,rrl,andrrf),25tRNAs, SupplementaryMaterialonline). and63proteins(seelegendoffig.3).Asexpected,virtuallyall standard genes in IR-less genomes are present in one copy; theonlyexceptionsaretheMarvaniageminatatrnG(gcc)and StandardGenes Prasiolopsis rrf genes, which occur in two identical The 35 completely sequenced cpDNAs contain 105–114 and nonidentical copies (106/121 identity), respectively. uniquestandardgenes,thatis,genesusuallypresentinchlo- These duplicated copies of these two genes may represent roplast genomes (table 1). Included in this category are all remnants of an ancestral IR. Note that trnG(gcc) is located GenomeBiol.Evol.7(7):2062–2082. doi:10.1093/gbe/evv130 AdvanceAccesspublicationJuly1,2015 2067 GBE Turmeletal. D o w n lo a d e d fro m h ttp s ://a c a d e m ic .o u p .c o m /g b e /a rtic le -a b s tra c t/7 /7 /2 0 6 2 /6 3 2 1 7 0 b FIG.3.—Generepertoiresofthechloroplastgenomesexaminedinthisstudy.Onlythegenesthataremissinginoneormoregenomesareindicated.The y g presenceofastandardgeneisdenotedbyabluebox.Atotalof91genesaresharedbyallcomparedgenomesthathavebeencompletelysequenced:accD, ue s atpA,B,E,F,H,I,cemA,clpP,ftsH,petA,B,D,G,psaA,B,C,I,J,M,psbA,B,C,D,E,F,H,I,J,K,L,M,N,T,Z,rbcL,rpl2,5,12,14,16,19,20,23,36,rpoA,B, t o n C1,C2,rps2,3,7,8,9,11,12,18,19,rrf,rrl,rrs,tufA,ycf1,3,4,20,trnA(ugc),C(gca),D(guc),E(uuc),F(gaa),G(gcc),G(ucc),H(gug),I(gau),K(uuu),L(uaa), 3 L(uag),Me(cau),Mf(cau),N(guu),P(ugg),Q(uug),R(ucu),R(acg),S(gcu),S(uga),T(ugu),V(uac),W(cca),andY(gua).Eightofthesegenes(petG,psbI, 0 M trnI(gau),L(uaa),P(ugg),R(ucu),S(gcu),T(ugu))havenotbeenidentifiedinthepartialchloroplastgenomesequenceofT.aggregata.Notethatycf12(psb30) a rc codesforasubunitofthephotosystemIIcomplex(Kashinoetal.2007). h 2 0 1 9 near the IR/LSC boundary in the closely related alga Cambiaireetal.2006,2007;Turmeletal.2008;Brouardetal. Pseudochloriswilhemii. 2008, 2010, 2011) and rpoC2 (Turmel et al. 2008) of other Three protein-coding genes, the rpoB and rpoC2 genes corechlorophytesandinthecaseofrpoB,itwasobservedthat encoding subunits of the RNA polymerase and the tilS gene the genes of Leptosira terrestris and of three chlorophycean encoding the tRNA(Ile)-lysidine synthetase, are fragmented greenalgaearefragmentedatthesamesite,nearthejunction and are not associated with sequences typical of group I or of a conserved segment of 80 codons and a highly variable groupIIintronsinseveralcoretrebouxiophyceans(supplemen- region(deCambiaireetal.2007).Alignmentsoftheproteins tary fig. S3, Supplementary Material online). The pieces of encodedbytherpoB,rpoC2,andtilSgenesexaminedinthe these fragmented genes are contiguous on all genome se- presentinvestigationrevealedthatallthesegenes,exceptthe quences,exceptforthefragmentsoftheXylochlorisirregularis S. bacillaris rpoB, share common fragmentation sites. Note and Watanabea reniformis tilS. Fragmented structures have that the P. brevispinosa rpoC2 gene exhibits three sites of been previously reported for rpoB (Be´langer et al. 2006; de fragmentation: The first site corresponds to that found in 2068 GenomeBiol.Evol.7(7):2062–2082. doi:10.1093/gbe/evv130 AdvanceAccesspublicationJuly1,2015 GBE ChloroplastGenomeEvolutionintheTrebouxiophyceae the Chlamydomononas moewusii gene (Turmel et al. 2008), earlier, the remnants of an ancestral IR. One of the two whereasthesecondsitecorrespondstothoseintheS.bacil- trnN(guu) loci in the partially assembled O. solitaria genome larisandW.reniformisgenes. liesjust30 oftherDNAoperonandmaythereforebepartof Thestandardgenesaccountingforthevariablecodingca- theIR(supplementaryfig.S1,SupplementaryMaterialonline). pacityoftheanalyzedchloroplastgenomesconsistof16pro- TheDiclosteracuatustrnD(guc)sequencesarepresentintheIR tein-coding genes, 10 tRNA genes and the gene for tmRNA and LSC regions, whereas both loci of P. brevispinosa (ssrA),asmallregulatoryRNAthathasbothmRNAandtRNA trnE(uuc) map within the IR. Duplicates of the latter gene activitiesandinteractswithstalledribosomestoresumetrans- have also been reported in the chloroplasts of other chloro- lation on the SsrA mRNA moiety (supplementary fig. S4, phytes(deCambiaireetal.2006;Brouardetal.2010). Supplementary Material online). Although rpl32 is missing D only in the incompletely sequenced genome of P. brevispi- ProportionofG+CinStandardProtein-CodingGenes o w nosa,thisgenelossappearsgenuinebecauseoursearchfor Given the important range of variation in G+C content ob- nlo rpl32inthesequenceassemblyoftotalcellularDNAproved ad servedforthecomparedchloroplastgenomes,weexamined e punlesducfrcoemssftuhl.eTPheedsinsroAphgyecneeaeisbpurtesaepnpteianrsthteotbhereaebtsaexnatsfraomm- theG+Ccompositionofprotein-codinggenesateachcodon d from position among these genomes using the concatenated nu- the chloroplast genomes of the investigated trebouxiophy- h cleotide data set (79 genes from 63 taxa, 15,468 codons) ttp ceans.Thisgenewaspreviouslyidentifiedinthechloroplasts s of the streptophyte Mesostigma viride and the prasinophy- analyzedbyLemieuxetal.(2014a).WefoundthattheG+C ://a contentatthirdcodonpositionsrangesfrom10%to25%in ca cean Nephroselmis olivacea (Gueneau de Novoa and d themajorityofexaminedchlorophytecpDNAs(supplementary e Williams2004).Inthecourseofthisstudy,wealsolocalized m fig. S5, Supplementary Material online). In contrast, higher ic itintherecentlysequencedcpDNAsoftwoadditionalprasi- .o G+C values ranging from 29% (Elliptochloris bilobata) to u nophyceans (Nephroselmis astigmatica and Picocystis sali- 64%(Trebouxiophyceaesp.MX-AZ01)areobservedatthird p.co narum) (Lemieux et al. 2014b), using a Smith–Waterman m search for similarity and the 50 and 30 conserved regions of codon positions for the G+C-biased genomes characterizing /gb allmembersoftheElliptochloris+Choricystisclade.Thecom- e standardone-piecetmRNAsasquerysequences.The50and30 /a terminalsequencescomposingthetRNA-likedomainsaswell positional bias is much less pronounced at the functionally rtic constrained first and second codon positions. Interestingly, le as the internal mRNA-like coding region are conserved in -a two other chlorophytes with a relatively high G+C content b chlorophyte and streptophyte ssrA genes (supplementary intheirchloroplastgenomes(X.irregularis,39.7%G+Cand stra fig.S4, Supplementary Material online). All six knownchlor- c Marsupiomonas sp. NIES 1824, 40.3% G+C) have a G+C t/7 ophytessrAgenesresideintheimmediatevicinityofrbcLand /7 content of more than 30% at the third codon positions of /2 areencodedonthesameDNAstrand(supplementaryfig.S1, 0 theirprotein-codinggenes. 6 SupplementaryMaterialonline). 2/6 Although the trnR(ccu) gene is restricted to three taxa of 32 UnusualGenes 1 theElliptochlorisclade,trnK(cuu),trnL(aag)andtrnP(ggg)are 7 0 found exclusively in S. bacillaris, Myrmecia israelensis and T. Wediscoveredpotentialcodingsequencesthatarenotusually by aggregata,respectively(fig.3).OurBLASTNsimilaritysearches found in green plant chloroplast genomes by carrying out gu e againstthenonredundantdatabaseofNCBIsuggestthateach BLASTP similarity searches against the nonredundant NCBI st o of the four tRNA genes arose from duplication and subse- database using as query sequences free-standing ORFs of n 3 quent sequence divergence of an existing chloroplast gene: more than 100 codons. ORFs showing similarities (E-value 0 M trnR(ccu)originatedfromtrnR(ucu),trnK(cuu)fromtrnK(uuu), thresholdof1e-06)withproteinsofknownfunctionsand/or a rc trnL(aag) from trnL(uag), and trnP(ggg) from trnG(gcc). Prior recognized protein domains were identified in 13 of the ex- h 2 toourstudy,theprasinophyceanPycnococcusprovasoliiwas aminedgenomesandgroupedintotencategoriesaccording 01 9 the only known chlorophyte carrying trnP(ggg) in its chloro- totheirputativefunction/domain(table2).AlloftheseORFs plast(Turmel,Gagnon,etal.2009)andasidefromCoccomyxa encode putative products acting on DNA or RNA. In all five subellipsoideae,trnR(ccu)hadbeenlocalizedonlyinthechlo- instanceswhereagivenORFisfoundindifferentspecies,we roplasts of the chlorophycean Oedogonium cardiacum findthatthelatterbelongtodifferentlineages.Threeindivid- (Brouard et al. 2008) and the ulvophyceans ual genomes, those of Paradoxia multiseta, Prasiolopsis sp., Pseudendoclonium akinetum and Oltmannsiellopsis viridis and Dicloster acuatus, exhibit two or more ORFs with the (Pombertetal.2005,2006). same function and/or recognized protein domain; in these AdditionaltRNAgenes(trnD(guc),trnE(uuc),trnG(gcc),and cases,nonidenticalcopiesarepresentineachgenome. trnG(uuu)) were likely duplicated in other lineages of the Interestingly, three members of the Prasiola clade Trebouxiophyceae, yielding identical copies. The two (Neocystis brevis, Pabia signiensis, and “Chlorella” mirabilis) trnG(gcc) sequences in the IR-less genome of M. geminata share with the deep-sea g-proteobacterium Marinobacter areeithertheproductsofaduplicationeventorasmentioned manganoxydans an ORF encoding a hypothetical protein GenomeBiol.Evol.7(7):2062–2082. doi:10.1093/gbe/evv130 AdvanceAccesspublicationJuly1,2015 2069 GBE Turmeletal. Table2 Nonstandard GenesIdentified asFreestandingORFsintheChloroplast GenomesExaminedinThisStudy Taxon ORFa GenomicCoordinates ConservedDomain Neocystisbrevis 148 110000–109554 DNAbreaking-rejoiningenzymes,C-terminalcatalyticdomain(cd00397) Paradoxiamultiseta 119 105729–106088 DNAbreaking-rejoiningenzymes,C-terminalcatalyticdomain(cd00397) Paradoxiamultiseta 298 35212–36108 DNAbreaking-rejoiningenzymes,C-terminalcatalyticdomain(cd00397) Prasiolopsissp.SAG84.81 154 277159–277623 Integrasecoredomain(pfam00665) Prasiolopsissp.SAG84.81 298 296164–297060 Integrasecoredomain(pfam00665) Prasiolopsissp.SAG84.81 200 274101–274703 Putativeintegrase/recombinase Dictyochloropsisreticulata 102 51790–51482 Serinerecombinasefamily,resolvaseandinvertasesubfamily,catalytic domain(cd03768) D o Botryococcusbraunii 117 24161–23808 Phage-associatedDNAprimase(COG3378) w n Prasiolopsissp.SAG84.81 653 183296–185257 Phage/plasmidprimase,P4family,C-terminaldomain(TIGR01613) lo a Watanabeareniformis 403 111049–112260 PrimaseCterminal1(smart00942) de d Diclosteracuatus 153 116412–116873 DNApolymerasetype-Bfamilycatalyticdomain(cd00145) fro Diclosteracuatus 328 93575–94561 DNApolymerasetype-Balphasubfamilycatalyticdomain(cd05532) m Marvaniageminata 242 8571–7843 Deoxyribonucleosidekinase(cd01673) http Microthamnionkuetzingianum 139 12286–12705 TypeIIrestrictionendonucleaseNlaIII;HNHendonuclease s Pedinomonastuberculata 214 109148–109792 HaeIIIrestrictionendonuclease(pfam09556) ://a c Chlorellavariabilis 123b 99567–99938 N-6DNAmethylase(pfam02384) ad e Chlorellavariabilis 338b 100401–101417 N-6DNAmethylase(pfam02384) m Chlorellavariabilis 152b 101377–101835 N-6DNAmethylase(pfam02384) ic.o Chlorellavariabilis 175c 19229–19756 LAGLIDADGDNAendonucleasefamily(pfam00961) up Neocystisbrevis 331c 21453–22448 LAGLIDADGDNAendonucleasefamily(pfam03161) .co m Trebouxiophyceaesp.MX-AZ01 119c 7868–8227 LAGLIDADGDNAendonucleasefamily(pfam03161) /g Dictyochloropsisreticulata 671c 127765–129780 ReversetranscriptasewithgroupIIintronorigin(cd01651) be Pleurastrosarcinabrevispinosa 214c 282883–283527 GReroveurpseIItirnatnrsocnri,pmtaasetuwraisteh-sgpreocuifipcIdIoinmtraoinno(prfigaimn0(8c3d8081)651) /article -a aReported here are the freestanding ORFs larger than 100 codons that revealed similarity (E-value threshold of 1e-06) with proteins of known function and/or b s recognizedproteindomainsinourBLASTPsearches.EachORFisidentifiedbythenumberofaminoacidresiduesintheencodedprotein. tra bTheorf123,orf338,andorf152ofChlorellavariabilismaybepartofalargerORFconsideringthattheyarecontiguousonthegenomesequenceandallshowsimilarity c toN-6DNAmethylases. t/7 cTheseORFsarenotencodedwithinrecognizablegroupIandgroupIIintronsequencesandthusappeartobefree-standing. /7/2 0 6 2 /6 3 2 (supplementary fig. S6, Supplementary Material online). trebouxiophyceans from the Oocystis clade show the most 17 0 ThishypotheticalgeneislocatedwithintheIRinPa.signiensis, similaritytotheancestralpartitioningpattern. b y attwodistinctsitesin“Chlorella”mirabilisandatfoursitesin AllgenesfoundinthepedinophyceanIRs,withtwoexcep- g u N. brevis (supplementary fig. S1, Supplementary Material tion(psbAandtrnS(gcu)),arealsoIR-encodedinallfourmem- es online). bersoftheGeminellaclade.Includedinthisgenesetarethe t on 3 genespresentintheSSCregionsoftheprasiolaleangenomes, 0 M which are not typically found in the SSC region in genomes a GenePartitioningPatternsbetweentheIRandSingle- exhibiting the ancestral partitioning pattern. Based on the rch CopyRegions 2 genecontentdifferencesobservedfortheIRintrebouxiophy- 0 1 All18chloroplastgenomescontainingalargerDNA-encoding ceanlineages,itisclearthattheIR/SSCandIR/LSCboundaries 9 IR,withasingleexception(theX.irregularisgenome),display eachunderwentfrequentshiftsinbothdirections(i.e.,either apatternofgenepartitioningthatcloselyresemblesthepat- towardtheneighboringsingle-copyregionortowardtheIR) tern observed for several prasinophycean and streptophyte duringevolution. algae (fig. 4). However, a few genes typically located 50 of Numerousdifferencesingeneorderarealsoobservedbe- therDNAoperon(i.e.,neartheLSCregion)inprasinophycean tweentheIRsoftheanalyzedgenomes.Forexample,inthe andstreptophytegenomesarefound30 oftherRNAoperon IRsofMarsupiomonasandthreeofthefourrepresentativesof ornear/withintheSSCregioninthepedinophyceanandmost theGeminellaclade,thepositionoftherDNAoperonrelative trebouxiophyceanIR-containinggenomes.Inaddition,genes to the SSC regiondiffers from that generally found in prasi- ancestrally located 30 of the rDNA operon (i.e., near/within nophyceanandstreptophytecpDNAs,implyingthatinversions the SSC region) have been shifted to the LSC side in ofsequenceswithintheIRoccurredfrequently.Geneswithin four of the analyzed genomes. The genomes of the two therDNAoperonwerenotsparefromsuchrearrangementsin 2070 GenomeBiol.Evol.7(7):2062–2082. doi:10.1093/gbe/evv130 AdvanceAccesspublicationJuly1,2015 GBE ChloroplastGenomeEvolutionintheTrebouxiophyceae D o w n lo a d e d fro m h ttp s ://a c a d e m ic .o u p .c o m /g b e /a rtic le -a b s tra c t/7 /7 /2 0 6 2 /6 3 2 1 7 0 b y g u e s t o n 3 0 M a rc h 2 0 1 9 FIG.4.—GenepartitioningpatternsoftheIR-containingchloroplastgenomesexaminedinthisstudy.TheIRsspanthesequencedelimitedbythick verticallines;onlytheIR/LSCjunctionwasidentifiedintheO.solitariagenome,withthesequencecorrespondingtothedottedlinesbeingmostlikelypartof theIR.NotethatthegenesequencesspanningtheIR/SSCorIR/LSCjunctionarerepresentedintheSSCorLSCregion,respectively.Thefivegenescomposing therDNAoperonarehighlightedinyellow.Thecolorassignedtoeachoftheremaininggenesisdependentuponthepositionofthecorrespondinggene relativetotherDNAoperoninpreviouslyreportedIR-containingprasinophyceanandstreptophytecpDNAsdisplayinganancestralgenepartitioningpattern. ThegeneshighlightedinbluearefoundwithinorneartheSSCregioninancestralgenomes(downstreamoftherDNAoperon),whereasthosehighlightedin orangearefoundwithinorneartheLSCregion(upstreamoftherDNAoperon). GenomeBiol.Evol.7(7):2062–2082. doi:10.1093/gbe/evv130 AdvanceAccesspublicationJuly1,2015 2071
Description: