ebook img

Characterization of the Complete Chloroplast Genome of Apple (Malus × domestica, Rosaceae)* PDF

2013·12 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Characterization of the Complete Chloroplast Genome of Apple (Malus × domestica, Rosaceae)*

植 物 分 类 与 资 源 学 报 ꎬ36 ( ):   2014 4 468~484 Plant Diversity and Resources :                                     DOI 10.7677/ynzwyj201413188 苹果叶绿体基因组特征分析 ∗ 金桂花1ꎬ2ꎬ 陈斯云1ꎬ 伊廷双1ꎬ 张书东1∗∗ 中国科学院昆明植物研究所中国西南野生生物种质资源库 云南昆明 中国科学院大学 北京 (1 ꎬ   650201ꎻ2 ꎬ   100049) 摘要: 苹果 Malus domestica 是最重要的温带水果之一 为了能更好的了解本种的分子生物学基础 ( × ) ꎮ ꎬ 对已发布的苹果叶绿体全基因组序列进行了结构特征分析 结果显示苹果的叶绿体基因组全长为 ꎮ 160068 具有典型的被子植物叶绿体基因组的环状四分体结构 包含大单拷贝区 小单拷贝区 bpꎬ ꎬ (LSC)ꎬ (SSC) 和两个反向互补重复区 长度分别为 和 基因组共有 个基因 (IRs)ꎬ 88184bpꎬ 19180bp 26352bpꎮ 135 (20 个基因分布在反向互补重复区 因此整个基因组包含 个不同的基因 按照功能进行分类 这 个 ꎬ 115 )ꎮ ꎬ 115 基因包括 个蛋白质编码基因 个 编码基因和 个 基因 其中 ycf15 ycf68和infA三个 81 ꎬ 4 rRNA 30 tRNA ꎮ ꎬ ꎬ 基因包含多个终止密码子 推测可能为假基因 苹果的基因组结构 基因顺序 含量和密码子使用偏 ꎬ ꎮ ꎬ ꎬ GC 好均与典型的被子植物叶绿体基因组类似 在苹果的叶绿体基因组中 共检测到 个大于 的重复序 ꎮ ꎬ 30 30bp 列 其中包括 串联重复 个正向重复和 个反向重复序列 并检测到 个简单重复序列 位 ꎬ 21 ꎬ 6 3 ꎻ 237 (SSR) 点 大部分的 位点都偏向于 或者 组成 此外 每 非编码区平均分布有 个 位点 而 ꎬ SSR A T ꎮ ꎬ 10000bp 24 SSR ꎬ 编码区平均有 个 位点 表明 在叶绿体基因组上的分布是不均匀的 本文对苹果叶绿体基因组序 5 SSR ꎬ SSRs ꎮ 列特征的报道 将有助于促进该种的居群遗传学 系统发育和叶绿体基因工程的研究 ꎬ 、 ꎮ 关键词: 苹果 叶绿体基因组 重复分析 ꎻ ꎻ ꎻ SSRs 中图分类号: 文献标识码: 文章编号: Q75            A                2095-0845(2014)04-468-17 Characterization of the Complete Chloroplast Genome of Apple Malus domestica Rosaceae ∗ ( × ꎬ ) 1ꎬ2 1 1 1∗∗ JIN Gui ̄Hua ꎬ CHEN Si ̄Yun ꎬ YI Ting ̄Shuang ꎬ ZHANG Shu ̄Dong GermplasmBankofWildSpeciesinSouthwestChina KunmingInstituteofBotany ChineseAcademyofSciences (1 ꎬ ꎬ ꎬ UniversityofChineseAcademyofSciences Kunming650201ꎬChinaꎻ2 ꎬBeijing100049ꎬChina) Abstract Malus domestica : Apple( × ) isoneofthemostimportanttemperatefruits. Tobetterunderstandthemolec ̄ ular basis of this speciesꎬ we characterized the complete chloroplast (cp) genome sequence downloaded from Ge ̄ nome Database for Rosaceae. The cp genome of apple is a circular molecule of 160068bp in length with a typical quadripartite structureoftwoinvertedrepeats(IRs) of26352bpꎬseparatedbyasmallsinglecopyregionof19180 bp (SSC) and a large single ̄copy region (LSC) of88184bp. A total of 135 predicted genes (115 unique genesꎬ and another20 genes were duplicated in the IR) were identifiedꎬ including 81 protein ̄coding genesꎬ four rRNA ycf15 ycf68 infA genes and30tRNAgenes. Threegenesof ꎬ and containseveralinternalstopcodonsꎬwhichwerein ̄ terpreted as pseudogenes. ThegenomestructureꎬgeneorderꎬGCcontentandcodonusageofapplearesimilartothe typical angiosperm cpgenomes. Thirtyrepeatregions(≥30bp) weredetectedꎬtwenty ̄oneofwhicharetandemꎬsix are forward and three are inverted repeats. Two hundred thirty ̄seven simple sequence repeat (SSR) loci were re ̄ ∗Funding:TheMinistryofScienceandTechnologyꎬChinaꎬBasicResearchProject(2013FY112600)ꎬtheNationalNaturalScienceFounda ̄ tionofChina(31200172)ꎬandtheTalentProjectofYunnanProvince(2011CI042) ∗∗AuthorforcorrespondenceꎻE ̄mail:sdchang@mail􀆰kib􀆰ac􀆰cn Receiveddate: 2013-09-16ꎬ Accepteddate: 2013-11-18 作者简介 金桂花 女 在读硕士 主要从事植物生物信息学研究 : (1988-) ꎬ ꎬ ꎮ E ̄mail:jinguihua@mail􀆰kib􀆰ac􀆰cn 期 et al 4             JIN Gui ̄Hua .: Characterization of the Complete Chloroplast Genome of Apple 􀆺             4 69 vealed and most of them are composed of A or Tꎬ contributing to a distinct bias in base composition. Additionallyꎬ average 10000bp non ̄coding region contains24SSRsitesꎬwhileprotein ̄codingregioncontainsfiveSSRsitesꎬin ̄ dicating anunevendistributionofSSRs. Thecompletecpgenomesequenceofapplereportedinthispaperwillfacili ̄ tate the future studies of its population geneticsꎬ phylogenetics and chloroplast genetic engineering. Key words : Appleꎻ Chloroplast genomeꎻ Repeat analysisꎻ SSRs Malus domestica   Appleꎬ × Borkh.ꎬ belongs to increasing number of whole cp genome availableꎬ et al the tribe Pyreae of Rosaceae (Potter .ꎬ 2007)ꎬ many structural rearrangementsꎬ large IR expansion/ cultivated all over the world except in Tundra cli ̄ contraction and gene loss have been found (Chumley et al et al et al mates and the arctic regions. Apple is one of the ol ̄ .ꎬ 2006ꎻ Millen .ꎬ 2001ꎻ Guisinger .ꎬ per se dest and most economically important temperate 2010). These events coupled with sequences fruit. Globallyꎬ there are more than 7 500 known provide sufficient information for genome ̄wide evolu ̄ cultivars of applesꎬ resulting in a range of desired tionary studies. It has shown great potentials in resol ̄ characteristics. According to the data from the Food ving phylogenetic questions at both high and low ta ̄ and Agriculture Organization of the United Nationsꎬ xonomic levelsꎬ and sometimes it is necessary to use the total apple production in 2010 was about 69 mil ̄ complete cp genome sequences for resolving complex et al lion tonsꎬ and the overall area of apple plantation evolutionary relationships (Givnish .ꎬ 2010ꎻ et al was 5􀆰62 million hectares (www􀆰fao􀆰org). Apple is Downie and Palmerꎬ 1992ꎻ Jansen .ꎬ 2007). considered to have the best economic valueꎬ but this Meanwhileꎬ comparative analysis of cp genomes from species is highly susceptible to a number of fungalꎬ distant and closely related species will facilitate the bacterial diseases and insect pestsꎬ which annually association of important traits controlled by plastid et al reduce the harvest by 12% to 25%. Howeverꎬ intro ̄ genomes (Liu .ꎬ 2013). et al duction or deletion of target genes by means of con ̄ Velasco . (2010) reported a high ̄quality ventional hybridization is generally costlyꎬ of low ef ̄ draft genome sequence of apple and reconstructed Malus ficiency and a long ̄term process because of the high the phylogeny of the genus applying 23 nucle ̄ heterozygocity and long juvenile period of the apple ar genesꎬ the progenitor of the cultivated apple have M sieversii plants. been identified as 􀆰 . Compared with the The chloroplasts (cp)ꎬ considered to be origi ̄ nuclear genome sequenceꎬ our understanding of ap ̄ nated from cyanobacteria through endosymbiosis are ple’s cp genome is left behindꎬ although the com ̄ the photosynthetic organelles that provide essential plete chloroplast genome of apple has been released et al et energy for plants and algae (Howe .ꎬ 2003ꎻ alonged with nuclear genome sequence (Velasco al Grayꎬ 1989). This intracellular organelle encodes a .ꎬ 2010ꎬ http: / /www􀆰rosaceae􀆰org/projects/ap ̄ number of chloroplast ̄specific components and in ̄ ple_genome). In this articleꎬ we annotated the cp volves in major functions such as sugar synthesisꎬ genome of apple in detail. In additionꎬ we deter ̄ starch storageꎬ the production of several amino mined the distribution and location of microsatellites acidsꎬ lipidsꎬ vitamins and pigments and also in key (SSRs) and repeats in the apple cp genome. The et sulfur and nitrogen metabolic pathways (Martin obtained cp genome information will be widely used al .ꎬ 2013). Earlier studies have demonstrated that for its population genetics and breeding programs. gene contentꎬ gene orderꎬ and genome organization 1 Materials and methods of cp genome are largely conserved within land   1􀆰1 Genome annotation plants with restriction site mapping (Raubeson and   et al Malus Jansenꎬ 2005ꎻ Palmerꎬ 1991). Howeverꎬ with the Velasco . (2010) have assembled × 植 物 分 类 与 资 源 学 报 第 卷  470                                                            36 domestica cp genome sequence with 847 ×coverage. di ̄ꎬ tri ̄ꎬ tetra ̄ꎬ penta ̄ꎬ and hexa ̄nucleotide re ̄ This high ̄quality cp genome sequence can be down ̄ peats) detection was performed using MISA (Thiel et al loaded from GDR/Genome Database for Rosaceae .ꎬ2003) with minimum number of repeats of 8ꎬ (http: / /www􀆰rosaceae􀆰org/projects/apple_genome). 4ꎬ 4ꎬ 3ꎬ 3ꎬ 3 for 1ꎬ 2ꎬ 3ꎬ 4ꎬ 5ꎬ 6 unit sizeꎬ re ̄ The cp genome was annotated using the program spectively. SSRs analysis only considered one invert ̄ et al DOGMA (Wyman .ꎬ2004)ꎬ coupled with man ̄ ed repeat region (IRb). All of the repeats found ual corrections for start and stop codons and intron/ were manually verifiedꎬ and the redundant results exon boundaries. The tRNA genes were identified u ̄ were removed. et al sing DOGMA and tRNAscan ̄SE (Schattner .ꎬ 2 Results 2005). Codon usage was analyzed using VB script.   2􀆰1 Genome organization The circular cp genome map was drawn using OG ̄   et al DRAW program (Lohse .ꎬ 2007). The complete cp genome of apple is a circular 1􀆰2 Repeat analysis   DNA molecule of 160 068 bp with a quadripartite et al REPuter (Kurtz .ꎬ 2001) was used to vi ̄ structure typical of the majority of the land plant sualize both forward and inverted repeats. The mini ̄ chloroplast chromosomes. It has the largest cp ge ̄ mal repeat size was set to 30 bp and the identity of nome size among five Rosaceae species (Table 1). repeats was no less than 90% (hamming distance e ̄ The cp genome harbors a pair of identical inverted qual to 3). Tandem repeats were analyzed using repeat regions (IRa and IRb)ꎬ which are 26352bp Tandem Repeats Finder (TRF) v4􀆰04 (Bensonꎬ each. The inverted repeat regions are separated by et 1999) with parameter settings as described by Nie the large (LSC) and small (SSC) single ̄copy re ̄ al . (2012). Overlapping repeats were merged into gions of 88184 and 19180 bpꎬ respectively (Table rps19 one repeat motif whenever possible. A given region 1ꎬ Fig􀆰1). The IRs span from to portion of ycf1 in the genome was designated as only one repeat . The overall GC content of the apple cp genome typeꎬ and tandem repeat was prior to other repeats if is 36􀆰5%ꎬ42􀆰7% within the inverted repeat regionꎬ one repeat motif could be identified as both tandem 34􀆰2% and 30􀆰4% within the LSC and SSC (Table and other ones. 2). The high GC content of IRs is caused by four 1􀆰3 SSR analysis   GC ̄rich rRNA genes (with an average GC content of We detected SSRs longer than 8 bp from apple 55􀆰5%). 2􀆰2 Gene content cp genome. This threshold was set because SSRs of8   bp or longer are prone to slip ̄strand mispairingꎬ The positions of all the genes identified in the which is thought to be the primary mutational mecha ̄ apple cp genome and category ̄wise distribution of nism causing their high level of polymorphism (Huo ̄ these genes are presented in Figure 1 and Table 3. et al tari and Korpelainenꎬ2012ꎻ Raubeson .ꎬ2007ꎻ The apple cp genome encodes 135 predicted genesꎬ Rose and Falushꎬ 1998). Microsatellites (mono ̄ꎬ of which 115 are unique. The unique genes include Table1  SummaryoftheRosaceaecpgenomefeatures GenomeSize LSClength IRalength SSClength Taxon Genbank Reference /bp /bp /bp /bp Fragariavesca vesca etal subsp. NC_015206 155691 85606 25555 18175 Shulaev .ꎬ2011 Pentactinarupicola NC_016921 156612 84970 26350 19237 LeeandHongꎬ2011 Prunuspersica etal NC_014697 157790 85969 26381 19060 Jansen .ꎬ2011 Pyruspyrifolia etal NC_015996 159922 87901 26392 19237 Terakami .ꎬ2012 Malus domestica × 160068 88184 27352 19180 Thisstudy 期 et al 4             JIN Gui ̄Hua .: Characterization of the Complete Chloroplast Genome of Apple 􀆺             4 71 81 protein ̄codingꎬ 30 tRNA and four rRNA genes all four rRNA genes are duplicated in the IR regions. (Table 3). Nine protein ̄codingꎬ seven tRNA and Protein ̄coding genesꎬ tRNAs and rRNAs make up Fig􀆰1  Mapoftheapplecpgenome ThethicklinesindicatetheextentoftheIRs(IRaandIRb) whichseparatethegenomeintoSSCandLSCregions.Geneslyingoutsidethe maparetranscribedclockwisewhereasgeneinsidearetranscribedcounterclockwise.Genesbelongingtodifferentfunctionalgroupsarecolor coded. AreadasheddarkergrayintheinnercircleindicatesGCcontentwhilethelightergraycorrespondstoATcontentofthegenome Table2  Basecompositionintheapplechloroplastgenome Genomefeatures Codoncomposition A/% T(U)/% G/% C/% Length/bp LSC 32􀆰2 33􀆰6 16􀆰6 17􀆰6 88184 SSC 34􀆰8 34􀆰8 14􀆰5 15􀆰9 19180 IRa 28􀆰6 28􀆰7 20􀆰6 22􀆰1 26352 IRb 28􀆰7 28􀆰6 22􀆰1 20􀆰6 26352 Total 35􀆰1 32􀆰1 17􀆰9 18􀆰6 160068 CDS 30􀆰9 31􀆰4 20􀆰1 17􀆰5 79650 1stposition 31􀆰0 23􀆰8 26􀆰7 18􀆰5 26550 2ndposition 29􀆰6 32􀆰5 17􀆰8 20􀆰1 26550 3rdposition 32􀆰2 38􀆰0 15􀆰9 13􀆰9 26550 CDS:CodingDNASequence 植 物 分 类 与 资 源 学 报 第 卷  472                                                            36 47􀆰9%ꎬ 1􀆰7% and 5􀆰4% of the genomeꎬ respective ̄ the 5’ exon is located in the LSC regionꎬ and the 3’ ycf1 rps19 lyꎬ while introns and intergenic spacers constitute the exon is located in the IR regions. The and remaining 45􀆰0%. The LSC region contains 61 pro ̄ are located in the boundary regions between IRb/SSC tein ̄coding genes and 22 tRNA genesꎬ whereas the and IRa/LSCꎬ respectively. Incomplete duplications ycf1 rps19 SSC region contains 11 protein ̄coding genes and one of the normal copy of and at these bounda ̄ tRNA gene. Eighteen genes in the apple cp genome ries have resulted in a lack of protein ̄coding ability. clpP rps12 ycf3 psbD psbC ycf1 ndhF contain intronsꎬ three ( ꎬ and ) of which The  ̄ and  ̄ are two cases of over ̄ trnK ̄UUU consisted of two introns (Table 4). The lapping genes. 2􀆰3 Codon usage has the largest intron (2 516 bp)ꎬ where another   matK rps12 geneꎬ ꎬ is nested within it. For the geneꎬ Based on the sequences of protein ̄coding genes Table3  Genespresentintheapplechroloplastgenome Groupofgenes Genenames psaA psaB psaC psaI psaJ PhotosystemI ꎬ ꎬ ꎬ ꎬ psbA psbB psbC psbD psbE psbF psbH psbI psbJ psbK psbL psbM psbN psbT psbZ PhotosystemII ꎬ ꎬ ꎬ ꎬ ꎬ ꎬ ꎬ ꎬ ꎬ ꎬ ꎬ ꎬ ꎬ ꎬ Cytochromeb/fcomplex petAꎬpetB∗ꎬpetD∗ꎬpetGꎬpetLꎬpetN ATP synthase atpAꎬatpBꎬatpEꎬatpF∗ꎬatpHꎬatpI NADHdehydrogenase ndhA∗ꎬndhB∗(×2)ꎬndhCꎬndhDꎬndhEꎬndhFꎬndhGꎬndhHꎬndhIꎬndhJꎬndhK rbcL RubisCOlargesubunit RNApolymerase rpoAꎬrpoBꎬrpoC1∗ꎬrpoC2 Ribosomalproteins(SSU) rps2ꎬrps3ꎬrps4ꎬrps7(×2)ꎬrps8ꎬrps11ꎬrps12∗∗(×2)ꎬrps14ꎬrps15ꎬrps16∗ꎬrps18ꎬrps19 Ribosomalproteins(LSU) rpl2∗(×2)ꎬrpl14ꎬrpl16∗ꎬrpl20ꎬrpl22ꎬrpl23(×2)ꎬrpl32ꎬrpl33ꎬrpl36 Othergenes clpP∗∗ꎬmatKꎬaccDꎬccsAꎬinfAꎬcemA Proteinsofunknownfunction ycf1ꎬycf2(×2)ꎬycf3∗∗ꎬycf4ꎬycf15(×2)ꎬycf68(×2) TransferRNAs 37tRNAs(6containanintronꎬ7intheIRs) rrn4􀆰5 rrn5 rrn16 rrn23 RibosomalRNAs (×2)ꎬ (×2)ꎬ (×2)ꎬ (×2) Oneortwoasterisksaftergenesindicatethatgenecontainsoneortwointronsꎬrespectively Table4  Thegeneswithintronsintheapplecpgenomeandthelengthoftheexonsandintrons Gene Location ExonI/bp IntronI/bp ExonII/bp IntronII/bp ExonIII/bp atpF LSC 411 733 144 clpP LSC 228 650 291 824 69 ndhA SSC 540 1141 552 ndhB IR 756 670 777 petB LSC 6 798 642 petD LSC 9 725 474 rpl16 LSC 399 989 9 rpl2 IR 435 687 390 rpoC1 LSC 1611 742 435 rps12∗ LSC 114 — 26 542 231 rps16 LSC 231 860 42 trnA ̄UGC IR 38 808 35 trnG ̄GCC LSC 23 707 37 trnI ̄GAU IR 42 944 35 trnK ̄UUU LSC 35 2516 37 trnL ̄UAA LSC 37 515 50 trnV ̄UAC LSC 37 593 39 ycf3 LSC 153 745 228 709 126 rps12 The isatrans ̄splicedgenewiththe5’ endlocatedintheLSCregionandtheduplicated3’ endintheIRregions 期 et al 4             JIN Gui ̄Hua .: Characterization of the Complete Chloroplast Genome of Apple 􀆺             4 73 psbZ trnG ̄UCC and tRNA genes within the chloroplast genomeꎬ the in the intergenic region between and relative synonymous codon usage (RSCU) (Sharp within the LSC. Tandem repeatsꎬ accounting for 70% and Liꎬ 1986) was deduced for the apple genome of total repeatsꎬare the most common among three re ̄ and summarized in Supplementary Table 1. The co ̄ peat types (Fig􀆰3: B). Most of the repeats (76%) don usage of the apple chloroplast genome strongly are distributed within the intergenetic spacer regionsꎬ reflects the AT bias. Within coding DNA sequence together with 8% in the intronsꎬ 8% in the CDS re ̄ (CDS)ꎬ the percentage of AT content for the firstꎬ gion and 8% in the tRNAꎬ respectively (Fig􀆰3: C). 2􀆰6 SSR analysis second and third codon positions are54􀆰8%ꎬ62􀆰1%   and 70􀆰2%ꎬ respectively (Table 2). Moreoverꎬ the Chloroplast simple sequence repeats (SSRs) of 81 protein ̄coding genes comprise 79 650 bp coding apple were examined and listed in Supplementary for 26 550 codons. Among these codonsꎬ 2 781 Table 3ꎬ along with their nucleotide sequences and (10􀆰5%) encode leucineꎬ and307 (1􀆰1%) encode positions within the cp genome. We indentified 237 cysteineꎬ which are the most and least prevalent a ̄ SSR loci (≥8bp) totallyꎬ of which 164 mononucle ̄ mino acidsꎬ respectively. The highest codon usage otideꎬ 68 dinucleotideꎬ four tetranucleotideꎬ and was observed for ATT or isoleucine (Ile). High co ̄ one hexanucleotide. Among these cpSSR nucleotide don usage was also observed for Lysine (Lys) and unitsꎬ the longest one is a polyT of 26 bpꎬ and the Glutamine (Glu) (Supplementary Table 1). Instead majority of mononucleotide repeat units are com ̄ of a common ATG start codonꎬ we identified GTG as posed of A (64) or T (94)ꎬ while only six are com ̄ rps19 start codon for . All of three stop codons are posed of tandem G or C. The majority of repeat units present with UAA being the most frequently used are ~9 bp long (62 with 8 bpꎬ 39 with 9 bpꎬ 21 (UAA 58􀆰8%ꎬ UAG 23􀆰3% and UGA 17􀆰8%). with 10 bp)ꎬ which are accounted for 51􀆰48% 2􀆰4 Non ̄functional genes   (122/237) of all cpSSRs. CpSSRs are unevenly dis ̄ ycf15 The gene employs GTG as start codon and tributed across the whole genome: 175 in the LSCꎬ several stop codons were detectedꎬ which indicates 23 in the IRbꎬ and 39 in the SSC regions. Analyses that it is most likely to be a non ̄functional gene. The of function ̄related location revealed 158 cpSSRs lo ̄ reading frame of this gene contains one insertion cate in intergenic spacer regionsꎬ 38 in intronsꎬ and ‘ACTA’ unitꎬ causing the frameshift and the resul ̄ 41 in CDS of 18 genesꎬ among whichꎬ 17 genes ting internal stop codons (Fig􀆰2: A). On the other were found to harbor at least two SSRs. ycf68 handꎬ the gene is a truncated pseudogene with 3 Discussion accumulated stop codons in its reading frameꎬ which   3􀆰1 Genome Organization caused by one absence ‘AAAC’ unit and two dele ̄   tion events (total 13 bp) (Fig􀆰2: B). We also In generalꎬ the size of photosynthetic land plant infA found gene was probably non ̄functional in apple plastid chromosomes ranges from 108 kb to 165 kb chloroplast genome due to the presence of several (Palmerꎬ 1991ꎻ Raubeson and Jansenꎬ 2005). The premature stop codons caused by insertion of one cp genome of apple is at the upper boundaryꎬ which ‘TATC’ unit (Fig􀆰2: C). is also the largest one among the five available Rosa ̄ 2􀆰5 Repeat analysis   ceae cp genomes. It is about 0􀆰1 kbꎬ 2􀆰2 kbꎬ 3􀆰4 Pyrus pyrifolia Prunus For repeat structure analysisꎬ we detected six kb and 4􀆰3 kb larger than ꎬ persica Pentactina rupicola Fragaria vestica directꎬ three inverted and 21 tandem repeats in the ꎬ and vestica apple cp genome (Supplementary Table 2). Most of subsp. ꎬ respectively. The genome size varia ̄ these repeats exhibit length between 30 and 41 bp tion is mainly caused by differences in the length of (Fig􀆰3: A). The longest repeat of 91 bp is located SSC and IR regions (Table 1). 植 物 分 类 与 资 源 学 报 第 卷  474                                                            36 Fig􀆰2  Alignmentofthreepseudogenes ycf15 Nicotiana Atropa A. Alignmentofthe geneandproteinsequencesinthetworepresentativespeciesofangiosperms[ (NC_001879) and (NC_004561)]. Blackasterisksindicatestopcodoninprotein.Redarrowsindicatetheinsertionregion‘TCTA’ inapple.B.Alignmentof ycf68 Zea Oryza the geneandproteinsequencesinthetworepresentativespeciesofangiosperms[ (NC_001666) and (NC_001320)].Red infA arrowsofboxindicatethe‘AAAC’ unitmissinginapple. C. Alignmentofthe geneandproteinsequencesinthethreerepresentative Vitis Sasamum speciesofangiosperms[ (NC_007957) and (NC_016433)]. Redarrowsindicatethe‘AGAT’ unitmissinginapple 期 et al 4             JIN Gui ̄Hua .: Characterization of the Complete Chloroplast Genome of Apple 􀆺             4 75 Fig􀆰3  Repeatstructureanalysisintheapplecpgenome Thecutoffvaluefortandemrepeatis15bpand30bpfordispersedrepeat. (A) Histogramshowingthenumberofrepeatsinthe applechloroplastgenome. (B) Compositionofthe30repeats. (C) Locationof30repeats et al ycf15   The apple cp genome exhibits largely identical (Shi .ꎬ 2013). In appleꎬ the imperfect gene order and content to most sequenced angio ̄ gene indicates that it is probably a remnant of a ycf68 sperm cp genomesꎬ emphasizing the highly con ̄ functional gene in one of its predecessors. The trnI ̄GAU served nature of these land plant cp genomes sequenceꎬ which occurs in the intronꎬ has et al (Wicke .ꎬ 2011). Its GC content is in accor ̄ been proved to be a functional protein encoding gene Pinus et al dance with the typical angiosperm cp genomes (Shi ̄ in riceꎬ cornꎬ maize and (Raubeson .ꎬ et al et al nozaki .ꎬ 1986ꎻ Kim and Leeꎬ 2004ꎻ Hiratsuka 2007). Howeverꎬ Raubeson . (2007) analyzed et al et al et al .ꎬ 1989ꎻ Sato .ꎬ 1999ꎻ Terakami .ꎬ this gene in 14 angiosperms and exhibit multiple 2012). The codon usage bias towards a higher AT frameshifts caused internal stop codons in most ca ̄ representation at the third codon position was also ob ̄ sesꎬ which is proved again in apple. Coding transla ̄ et al infA served in other land plant cp genomes (Yang .ꎬ tion initiation factor 1ꎬ gene stands out as an et al 2010ꎻ Nie .ꎬ 2012ꎻ Yi and Kimꎬ 2012ꎻ Tang ̄ unusually unstable angiosperm chloroplast geneꎬ et al et al phatsornruang .ꎬ 2010ꎻ Qian .ꎬ 2013). which has been detected to be lost from the chloro ̄ Three genes are non ̄functional in the apple cp plast genome on many separate occasions especially ycf15 infA ycf68 ycf15 genomeꎬ the ꎬ and . Both and in Eurosids and transferred to the nucleus multiple ycf68 et al contain four internal stop codons. These two times (Millen .ꎬ2001). The three eurosids taxa Eucalyptus Populus Jatropha infA pseudogenes has been rarely mentioned in previous ( ꎬ and ) contain ꎬ et al et al studies (Ravi .ꎬ 2007ꎻ Shi .ꎬ 2013) and however was proved to be pseudogene with multiple et al were not annotated in the other four reported Rosace ̄ stop codons (Asif .ꎬ 2010). Our results tell a ycf15 infA ae cp genomes. The validity of as a protein ̄ same story of in apple. Why these three genes et coding gene has long been questioned (Chumley degenerated in some land plant cp genome deserve al et al .ꎬ 2006ꎻ Steaneꎬ2005). Thoughꎬ Shi . (2013) further study. ycf15 have suggested the gene was transcribed as Most repeats are located in the intergenic spac ̄ precursor polycistronic transcript which contained ers and intronsꎬ but several occur in tRNA genes ycf2 ycf15 trnL ̄CAA Camellia ꎬ and antisense in the and CDS. Short dispersed repeats are considered to transcriptome. This gene is disabled in some of angi ̄ be one of the major factors promoting cp genome re ̄ Amborella et al osperms such as (Goremykin .ꎬ combination and rearrangement because they are Nuphar et al 2003) and (Raubeson .ꎬ 2007)ꎬ mono ̄ common in highly rearranged algal and angiosperm cotsꎬ most rosidsꎬ and some other separate lineages genomesꎬ and many rearrangement endpoints are as ̄ 植 物 分 类 与 资 源 学 报 第 卷  476                                                            36 et al et et al sociated with such repeats (Lee .ꎬ2007ꎻ Yue 2007ꎻ Verma .ꎬ 2008). Foreign gene integra ̄ al et al et al .ꎬ 2007ꎻ Haberle .ꎬ 2008ꎻ Pombert .ꎬ tion in to the chloroplast genome occurs via homolo ̄ et al 2005ꎻ Chumley .ꎬ 2006). In the un ̄rearranged gous recombination of flanking sequences used in cp genomeꎬ most of the repeats are located mostly in chloroplast vectors (Verma and Daniellꎬ 2007). intergenic spacer regions and intronsꎬ although sev ̄ Chloroplast transformation has made significant pro ̄ psaA eral are located in the protein ̄coding genes of ꎬ gress in the model species tobacco as well as in a psaB ycf2 et al et al and (Daniell .ꎬ 2006ꎻ Timme .ꎬ few major cropsꎬ such as potatoꎬ tomato and cotton et al et al 2007ꎻ Saski .ꎬ 2005). Repeat analysis of apple (Verma .ꎬ 2008ꎻ Verma and Daniellꎬ 2007). trnI trnA accD rbcL cp genome was carried out for the five available Ro ̄ Although the  ̄ and  ̄ intergenic saceae cp genomes for the first timeꎬ which will pro ̄ spacer regions have been widely used as gene intro ̄ et al vide more informative sources for developing markers duction sites for vector construction (Verma .ꎬ for its population and phylogeny studies. 2008)ꎬ the transformation efficiency is impaired In our studyꎬ we detected 237 SSRs with une ̄ when the sequences for homologous recombination ven distribution in the apple cp genome. Most of the are divergent among distantly related species (Ruhl ̄ et al SSRs were found in the nocoding regionsꎬ which is man .ꎬ 2006). Howeverꎬ spacer regions are not not unusual as a result of the higher number of muta ̄ 100% identical even in members of the same family. tions within these regions compared with more con ̄ Comparison of intergenic spacer regions among mem ̄ served coding regions (Ebert and Peakallꎬ 2009). bers of Solanaceae revealed that only four regions are et al Additionallyꎬ there was a significantly larger number identical (Daniell .ꎬ 2006). Similarlyꎬ compar ̄ of A and T microsatellites than G and Cꎬ which has ison of intergenic spacer regions of nine grass cp ge ̄ et been reported previously in other taxa (Kuang nomes revealed that not even a single spacer region al et al et al .ꎬ 2011ꎻ Qian .ꎬ 2013ꎻ Raubeson .ꎬ is identical among all sequenced cp genomes (Saski et al et al 2007). SSR is another repeat type which is based on .ꎬ 2007). Terakami . (2012) investigated simpler motif and shorter than aforementioned re ̄ several deletions and insertions in the intergenic Pyrus Malus Prunus peats. SSRs have been used to obtain high resolution spacer regions amongst the ꎬ and ndhC trnV trnR atpA rpl33 in some closely related plant taxaꎬ proving to be ef ̄ cp genomesꎬ such as  ̄ ꎬ  ̄ ꎬ  ̄ rps18 psbI trn accD psaI fective genetic markers to study plant breedingꎬ pop ̄ ꎬ  ̄ S and  ̄ . There are no inter ̄ ulation geneticsꎬ biological conservationꎬ mating genic spacer regions with 100% identity in the Rosa ̄ et al systemsꎬ and uniparental lineages (Terrab .ꎬ ceae available cp genome. The availability of the et al et al 2006ꎻ Cardle .ꎬ 2000ꎻ Peakall .ꎬ 1998). complete cp genome sequence of apple is helpful to By analyzing the complete chloroplast genome of ap ̄ identify the optimal intergenic spacers for transgene pleꎬ we hope to facilitate future studies by selecting integration and to develop site ̄specific cp transfor ̄ target regions for more in ̄depth population studies mation vectors. Using cp genetic engineering to in ̄ within the genus. troduce useful traitsꎬ such as pests resistance and 3􀆰2 Implications for Chloroplast Genetic Engi ̄   drought toleranceꎬ might be other applications to im ̄ neering prove this economic plant. Chloroplast genetic engineering is exemplary for References its unique advantages including the possibility of : etal AsifMHꎬMantriSSꎬSharmaA .ꎬ2010. Completesequenceand multi ̄gene engineering in a single transformation e ̄ Jatropha curcas organisation of the (Euphorbiaceae) chloroplast ventꎬ transgene containment due to maternal inheri ̄ TreeGenetics&Genomes 6 genome [J]. ꎬ :941—952 tanceꎬ high levels of transgene expression and lack BensonGꎬ1999. Tandemrepeatsfinder:Aprogram to analyze DNA NucleicAcidsResearch 27 of gene silencing (Daniellꎬ2007ꎻ Verma and Daniellꎬ sequences [J]. ꎬ :573—580 期 et al 4             JIN Gui ̄Hua .: Characterization of the Complete Chloroplast Genome of Apple 􀆺             4 77 et al CardleLꎬRamsay Lꎬ Milbourne D .ꎬ 2000. Computational and HuotariTꎬ Korpelainen Hꎬ 2012. Complete chloroplast genome se ̄ Elodeacanadensis experimental characterization of physically clustered simple se ̄ quenceof andcomparativeanalyseswith other Genetics 156 Gene 508 quencerepeatsinplants [J]. ꎬ :847—854 monocotplastidgenomes [J]. ꎬ :96—105 etal et al ChumleyTWꎬPalmerJDꎬMowerJP .ꎬ2006.Thecompletechlo ̄ JansenRKꎬCaiZꎬRaubeson LA .ꎬ2007. Analysis of81 genes Pelargonium hortorum roplastgenome sequence of × : Organiza ̄ from64plastidgenomesresolvesrelationshipsinangiospermsand Proceedings of tionandevolutionofthelargestandmosthighlyrearrangedchlo ̄ identifiesgenome ̄scaleevolutionary patterns [J]. MolecularBiologyand Evo ̄ theNationalAcademyofSciencesoftheUnited StatesofAmerica roplastgenomeoflandplants [J]. ꎬ lution 23 104 ꎬ :2175—2190 :19369—19374 etal DaniellHꎬ2007.Transgenecontainmentbymaternalinheritance:Ef ̄ Jansen RKꎬSaskiCꎬLeeSB .ꎬ2011. Completeplastid genome Proceedings of the National Academy of Castanea Prunus Theobroma fectiveorelusive?[J]. sequencesofthreerosids( ꎬ ꎬ ):Evi ̄ SciencesoftheUnitedStatesofAmerica 104 rpl22 ꎬ :6879—6880 denceforatleasttwoindependenttransfersof tothenucleus etal MolecularBiologyandEvolution 28 DaniellHꎬLeeSBꎬGrevichJ .ꎬ2006. Completechloroplastge ̄ [J]. ꎬ :835—847 Solanum bulbocastanum Solanum lycopersi ̄ nome sequences of ꎬ KimKJꎬLeeHLꎬ2004.Completechloroplastgenomesequencesfrom cum Panaxschinseng and comparative analyses with other Solanaceae genomes Koreanginseng( Nees) and comparativeanaly ̄ TheoreticalandAppliedGenetics 112 DNA [J]. ꎬ :1503—1518 sisof sequence evolution among 17 vascular plants [J]. Research 11 Downie SRꎬ Palmer JDꎬ 1992. Use of chloroplast DNA rearrange ̄ ꎬ :247—261 etal ments in reconstructing plant phylogeny [A]. In: Soltis PSꎬ KuangDYꎬWuHꎬWangYL .ꎬ2011. Completechloroplastge ̄ Molecular Systematics of Plants Magnolia kwangsiensis Soltis DEꎬ Doyle JJ (eds.)ꎬ nomesequence of (Magnoliaceae): Im ̄ Ge ̄ [M]. NewYork:ChapmanandHallꎬ14—35 plicationfor DNA barcoding and population genetics [J]. nome 54 Ebert Dꎬ Peakall Rꎬ 2009. Chloroplast simple sequence repeats ꎬ :663—673 et al (cpSSRs):Technicalresourcesand recommendationsforexpan ̄ KurtzSꎬ Choudhuri JVꎬ Ohlebusch E .ꎬ 2001. REPuter: The dingcpSSR discovery and applications to a wide array of plant manifoldapplicationsofrepeatanalysis on a genomic scale [J]. MolecularEcologyResources 9 NucleicAcidsResearch 29 species [J]. ꎬ :673—690 ꎬ :4633—4642 etal GivnishTJꎬAmesMꎬMcNealJR .ꎬ2010.Assemblingthetreeof LeeCꎬHongSPꎬ2011. PhylogeneticrelationshipsoftherareKorean Pentactinanakai themonocotyledons:Plastome sequence phylogeny and evolution monotypicendemicgenus inthetribeSpiraeeae Annals of the Missouri Botanical Garden 97 PlantSystematicsand ofPoales [J]. ꎬ : (Rosaceae) basedonmoleculardata [J]. Evolution 294 584—616 ꎬ :159—166 et al et al Goremykin VVꎬ Hirsch ̄Ernst KIꎬ Wölfl S .ꎬ 2003. Analysis of LeeHLꎬ Jansen RKꎬ Chumley TW .ꎬ 2007. Gene relocations Amborella trichopoda Jasminum Menodora the chloroplast genome sequence suggests withinchloroplastgenomes of and (Oleace ̄ Amborella MolecularBiology Molecular that isnotabasalangiosperm [J]. ae) are due to multipleꎬ overlapping inversions [J]. andEvolution 20 BiologyandEvolution 24 ꎬ :1499—1505 ꎬ :1161—1180 Trendsin etal GrayMWꎬ1989.Theevolutionaryoriginsoforganelles [J]. LiuYꎬHuoNXꎬDongLL .ꎬ2013.Completechloroplastgenome Genetics 5 Artemisiafrigida ꎬ :294—299 sequencesofMongoliamedicine andphylogenetic etal PLOSONE 8 GuisingerMMꎬChumleyTWꎬKuehlJV .ꎬ2010. Implicationsof relationshipswithotherplants [J]. ꎬ :e57533 Typha theplastid genome sequence of (Typhaceaeꎬ Poales) for Lohse Mꎬ Drechsel Oꎬ Bock Rꎬ 2007. OrganellarGenomeDRAW Journalof Mo ̄ understandinggenomeevolutioninPoaceae [J]. (OGDRAW):Atoolfortheeasygenerationofhigh ̄qualitycus ̄ lecularEvolution 70 ꎬ :149—166 tomgraphicalmaps of plastid and mitochondrial genomes [J]. et al CurrentGenetics 52 Haberle RCꎬ Fourcade HMꎬ Boore JL .ꎬ 2008. Extensive rear ̄ ꎬ :267—274 Trachelium caeruleum et al rangements in the chloroplast genome of MartinGꎬBaurens FCꎬ Cardi C .ꎬ 2013. The complete chloro ̄ JournalofMo ̄ Musaacuminata areassociatedwithrepeatsandtRNAgenes [J]. plastgenomeofbanana( ꎬZingiberales):Insight lecularEvolution 66 PLOSONE 8 ꎬ :350—361 intoplastidmonocotyledonevolution [J]. ꎬ :e67350 et al etal HiratsukaJꎬShimadaHꎬ Whittier R .ꎬ1989. The complete se ̄ MillenRSꎬOlmsteadRGꎬAdamsKL .ꎬ2001.Manyparallellos ̄ Oryza sativa infA quenceofthe rice ( ) chloroplast genome: Intermo ̄ sesof fromchloroplastDNAduringangiospermevolutionwith The Plant lecularrecombinationbetweendistincttRNAgenesaccountsfora multipleindependent transfers to the nucleus [J]. Cell 13 majorplastid DNA inversion during the evolution of the cereals ꎬ :645—658 MolecularandGeneralGenetics 217 etal [J]. ꎬ :185—194 NieXJꎬLvSZꎬZhangYX .ꎬ2012.Completechloroplastgenome et al Ageratina HoweCJꎬ Barbrook ACꎬ Koumandou VL .ꎬ 2003. Evolution of sequenceof a major invasive speciesꎬ crofton weed ( Philosophical Transactions of the adenophora PLOSONE 7 the chloroplast genome [J]. ) [J]. ꎬ :e36869 RoyalSocietyofLondonSeriesB ̄BiologicalSciences 358 ꎬ :99— Palmer JDꎬ 1991. Plastid chromosomes: Structure and evolution CellCultureandSomatic 106 [A].In:BogoradLꎬVasilIK(eds.)ꎬ

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.