Table Of Content植 物 分 类 与 资 源 学 报 ꎬ36 ( ):
2014 4 468~484
Plant Diversity and Resources :
DOI 10.7677/ynzwyj201413188
苹果叶绿体基因组特征分析
∗
金桂花1ꎬ2ꎬ 陈斯云1ꎬ 伊廷双1ꎬ 张书东1∗∗
中国科学院昆明植物研究所中国西南野生生物种质资源库 云南昆明 中国科学院大学 北京
(1 ꎬ 650201ꎻ2 ꎬ 100049)
摘要: 苹果 Malus domestica 是最重要的温带水果之一 为了能更好的了解本种的分子生物学基础
( × ) ꎮ ꎬ
对已发布的苹果叶绿体全基因组序列进行了结构特征分析 结果显示苹果的叶绿体基因组全长为
ꎮ 160068
具有典型的被子植物叶绿体基因组的环状四分体结构 包含大单拷贝区 小单拷贝区
bpꎬ ꎬ (LSC)ꎬ (SSC)
和两个反向互补重复区 长度分别为 和 基因组共有 个基因
(IRs)ꎬ 88184bpꎬ 19180bp 26352bpꎮ 135 (20
个基因分布在反向互补重复区 因此整个基因组包含 个不同的基因 按照功能进行分类 这 个
ꎬ 115 )ꎮ ꎬ 115
基因包括 个蛋白质编码基因 个 编码基因和 个 基因 其中 ycf15 ycf68和infA三个
81 ꎬ 4 rRNA 30 tRNA ꎮ ꎬ ꎬ
基因包含多个终止密码子 推测可能为假基因 苹果的基因组结构 基因顺序 含量和密码子使用偏
ꎬ ꎮ ꎬ ꎬ GC
好均与典型的被子植物叶绿体基因组类似 在苹果的叶绿体基因组中 共检测到 个大于 的重复序
ꎮ ꎬ 30 30bp
列 其中包括 串联重复 个正向重复和 个反向重复序列 并检测到 个简单重复序列 位
ꎬ 21 ꎬ 6 3 ꎻ 237 (SSR)
点 大部分的 位点都偏向于 或者 组成 此外 每 非编码区平均分布有 个 位点 而
ꎬ SSR A T ꎮ ꎬ 10000bp 24 SSR ꎬ
编码区平均有 个 位点 表明 在叶绿体基因组上的分布是不均匀的 本文对苹果叶绿体基因组序
5 SSR ꎬ SSRs ꎮ
列特征的报道 将有助于促进该种的居群遗传学 系统发育和叶绿体基因工程的研究
ꎬ 、 ꎮ
关键词: 苹果 叶绿体基因组 重复分析
ꎻ ꎻ ꎻ SSRs
中图分类号: 文献标识码: 文章编号:
Q75 A 2095-0845(2014)04-468-17
Characterization of the Complete Chloroplast Genome
of Apple Malus domestica Rosaceae
∗
( × ꎬ )
1ꎬ2 1 1 1∗∗
JIN Gui ̄Hua ꎬ CHEN Si ̄Yun ꎬ YI Ting ̄Shuang ꎬ ZHANG Shu ̄Dong
GermplasmBankofWildSpeciesinSouthwestChina KunmingInstituteofBotany ChineseAcademyofSciences
(1 ꎬ ꎬ ꎬ
UniversityofChineseAcademyofSciences
Kunming650201ꎬChinaꎻ2 ꎬBeijing100049ꎬChina)
Abstract Malus domestica
: Apple( × ) isoneofthemostimportanttemperatefruits. Tobetterunderstandthemolec ̄
ular basis of this speciesꎬ we characterized the complete chloroplast (cp) genome sequence downloaded from Ge ̄
nome Database for Rosaceae. The cp genome of apple is a circular molecule of 160068bp in length with a typical
quadripartite structureoftwoinvertedrepeats(IRs) of26352bpꎬseparatedbyasmallsinglecopyregionof19180
bp (SSC) and a large single ̄copy region (LSC) of88184bp. A total of 135 predicted genes (115 unique genesꎬ
and another20 genes were duplicated in the IR) were identifiedꎬ including 81 protein ̄coding genesꎬ four rRNA
ycf15 ycf68 infA
genes and30tRNAgenes. Threegenesof ꎬ and containseveralinternalstopcodonsꎬwhichwerein ̄
terpreted as pseudogenes. ThegenomestructureꎬgeneorderꎬGCcontentandcodonusageofapplearesimilartothe
typical angiosperm cpgenomes. Thirtyrepeatregions(≥30bp) weredetectedꎬtwenty ̄oneofwhicharetandemꎬsix
are forward and three are inverted repeats. Two hundred thirty ̄seven simple sequence repeat (SSR) loci were re ̄
∗Funding:TheMinistryofScienceandTechnologyꎬChinaꎬBasicResearchProject(2013FY112600)ꎬtheNationalNaturalScienceFounda ̄
tionofChina(31200172)ꎬandtheTalentProjectofYunnanProvince(2011CI042)
∗∗AuthorforcorrespondenceꎻE ̄mail:sdchang@mailkibaccn
Receiveddate: 2013-09-16ꎬ Accepteddate: 2013-11-18
作者简介 金桂花 女 在读硕士 主要从事植物生物信息学研究
: (1988-) ꎬ ꎬ ꎮ E ̄mail:jinguihua@mailkibaccn
期 et al
4 JIN Gui ̄Hua .: Characterization of the Complete Chloroplast Genome of Apple 4 69
vealed and most of them are composed of A or Tꎬ contributing to a distinct bias in base composition. Additionallyꎬ
average 10000bp non ̄coding region contains24SSRsitesꎬwhileprotein ̄codingregioncontainsfiveSSRsitesꎬin ̄
dicating anunevendistributionofSSRs. Thecompletecpgenomesequenceofapplereportedinthispaperwillfacili ̄
tate the future studies of its population geneticsꎬ phylogenetics and chloroplast genetic engineering.
Key words
: Appleꎻ Chloroplast genomeꎻ Repeat analysisꎻ SSRs
Malus domestica
Appleꎬ × Borkh.ꎬ belongs to increasing number of whole cp genome availableꎬ
et al
the tribe Pyreae of Rosaceae (Potter .ꎬ 2007)ꎬ many structural rearrangementsꎬ large IR expansion/
cultivated all over the world except in Tundra cli ̄ contraction and gene loss have been found (Chumley
et al et al et al
mates and the arctic regions. Apple is one of the ol ̄ .ꎬ 2006ꎻ Millen .ꎬ 2001ꎻ Guisinger .ꎬ
per se
dest and most economically important temperate 2010). These events coupled with sequences
fruit. Globallyꎬ there are more than 7 500 known provide sufficient information for genome ̄wide evolu ̄
cultivars of applesꎬ resulting in a range of desired tionary studies. It has shown great potentials in resol ̄
characteristics. According to the data from the Food ving phylogenetic questions at both high and low ta ̄
and Agriculture Organization of the United Nationsꎬ xonomic levelsꎬ and sometimes it is necessary to use
the total apple production in 2010 was about 69 mil ̄ complete cp genome sequences for resolving complex
et al
lion tonsꎬ and the overall area of apple plantation evolutionary relationships (Givnish .ꎬ 2010ꎻ
et al
was 562 million hectares (wwwfaoorg). Apple is Downie and Palmerꎬ 1992ꎻ Jansen .ꎬ 2007).
considered to have the best economic valueꎬ but this Meanwhileꎬ comparative analysis of cp genomes from
species is highly susceptible to a number of fungalꎬ distant and closely related species will facilitate the
bacterial diseases and insect pestsꎬ which annually association of important traits controlled by plastid
et al
reduce the harvest by 12% to 25%. Howeverꎬ intro ̄ genomes (Liu .ꎬ 2013).
et al
duction or deletion of target genes by means of con ̄ Velasco . (2010) reported a high ̄quality
ventional hybridization is generally costlyꎬ of low ef ̄ draft genome sequence of apple and reconstructed
Malus
ficiency and a long ̄term process because of the high the phylogeny of the genus applying 23 nucle ̄
heterozygocity and long juvenile period of the apple ar genesꎬ the progenitor of the cultivated apple have
M sieversii
plants. been identified as . Compared with the
The chloroplasts (cp)ꎬ considered to be origi ̄ nuclear genome sequenceꎬ our understanding of ap ̄
nated from cyanobacteria through endosymbiosis are ple’s cp genome is left behindꎬ although the com ̄
the photosynthetic organelles that provide essential plete chloroplast genome of apple has been released
et al et
energy for plants and algae (Howe .ꎬ 2003ꎻ alonged with nuclear genome sequence (Velasco
al
Grayꎬ 1989). This intracellular organelle encodes a .ꎬ 2010ꎬ http: / /wwwrosaceaeorg/projects/ap ̄
number of chloroplast ̄specific components and in ̄ ple_genome). In this articleꎬ we annotated the cp
volves in major functions such as sugar synthesisꎬ genome of apple in detail. In additionꎬ we deter ̄
starch storageꎬ the production of several amino mined the distribution and location of microsatellites
acidsꎬ lipidsꎬ vitamins and pigments and also in key (SSRs) and repeats in the apple cp genome. The
et
sulfur and nitrogen metabolic pathways (Martin obtained cp genome information will be widely used
al
.ꎬ 2013). Earlier studies have demonstrated that for its population genetics and breeding programs.
gene contentꎬ gene orderꎬ and genome organization
1 Materials and methods
of cp genome are largely conserved within land
11 Genome annotation
plants with restriction site mapping (Raubeson and
et al Malus
Jansenꎬ 2005ꎻ Palmerꎬ 1991). Howeverꎬ with the Velasco . (2010) have assembled ×
植 物 分 类 与 资 源 学 报 第 卷
470 36
domestica
cp genome sequence with 847 ×coverage. di ̄ꎬ tri ̄ꎬ tetra ̄ꎬ penta ̄ꎬ and hexa ̄nucleotide re ̄
This high ̄quality cp genome sequence can be down ̄ peats) detection was performed using MISA (Thiel
et al
loaded from GDR/Genome Database for Rosaceae .ꎬ2003) with minimum number of repeats of 8ꎬ
(http: / /wwwrosaceaeorg/projects/apple_genome). 4ꎬ 4ꎬ 3ꎬ 3ꎬ 3 for 1ꎬ 2ꎬ 3ꎬ 4ꎬ 5ꎬ 6 unit sizeꎬ re ̄
The cp genome was annotated using the program spectively. SSRs analysis only considered one invert ̄
et al
DOGMA (Wyman .ꎬ2004)ꎬ coupled with man ̄ ed repeat region (IRb). All of the repeats found
ual corrections for start and stop codons and intron/ were manually verifiedꎬ and the redundant results
exon boundaries. The tRNA genes were identified u ̄ were removed.
et al
sing DOGMA and tRNAscan ̄SE (Schattner .ꎬ
2 Results
2005). Codon usage was analyzed using VB script.
21 Genome organization
The circular cp genome map was drawn using OG ̄
et al
DRAW program (Lohse .ꎬ 2007). The complete cp genome of apple is a circular
12 Repeat analysis
DNA molecule of 160 068 bp with a quadripartite
et al
REPuter (Kurtz .ꎬ 2001) was used to vi ̄ structure typical of the majority of the land plant
sualize both forward and inverted repeats. The mini ̄ chloroplast chromosomes. It has the largest cp ge ̄
mal repeat size was set to 30 bp and the identity of nome size among five Rosaceae species (Table 1).
repeats was no less than 90% (hamming distance e ̄ The cp genome harbors a pair of identical inverted
qual to 3). Tandem repeats were analyzed using repeat regions (IRa and IRb)ꎬ which are 26352bp
Tandem Repeats Finder (TRF) v404 (Bensonꎬ each. The inverted repeat regions are separated by
et
1999) with parameter settings as described by Nie the large (LSC) and small (SSC) single ̄copy re ̄
al
. (2012). Overlapping repeats were merged into gions of 88184 and 19180 bpꎬ respectively (Table
rps19
one repeat motif whenever possible. A given region 1ꎬ Fig1). The IRs span from to portion of
ycf1
in the genome was designated as only one repeat . The overall GC content of the apple cp genome
typeꎬ and tandem repeat was prior to other repeats if is 365%ꎬ427% within the inverted repeat regionꎬ
one repeat motif could be identified as both tandem 342% and 304% within the LSC and SSC (Table
and other ones. 2). The high GC content of IRs is caused by four
13 SSR analysis
GC ̄rich rRNA genes (with an average GC content of
We detected SSRs longer than 8 bp from apple 555%).
22 Gene content
cp genome. This threshold was set because SSRs of8
bp or longer are prone to slip ̄strand mispairingꎬ The positions of all the genes identified in the
which is thought to be the primary mutational mecha ̄ apple cp genome and category ̄wise distribution of
nism causing their high level of polymorphism (Huo ̄ these genes are presented in Figure 1 and Table 3.
et al
tari and Korpelainenꎬ2012ꎻ Raubeson .ꎬ2007ꎻ The apple cp genome encodes 135 predicted genesꎬ
Rose and Falushꎬ 1998). Microsatellites (mono ̄ꎬ of which 115 are unique. The unique genes include
Table1 SummaryoftheRosaceaecpgenomefeatures
GenomeSize LSClength IRalength SSClength
Taxon Genbank Reference
/bp /bp /bp /bp
Fragariavesca vesca etal
subsp. NC_015206 155691 85606 25555 18175 Shulaev .ꎬ2011
Pentactinarupicola
NC_016921 156612 84970 26350 19237 LeeandHongꎬ2011
Prunuspersica etal
NC_014697 157790 85969 26381 19060 Jansen .ꎬ2011
Pyruspyrifolia etal
NC_015996 159922 87901 26392 19237 Terakami .ꎬ2012
Malus domestica
× 160068 88184 27352 19180 Thisstudy
期 et al
4 JIN Gui ̄Hua .: Characterization of the Complete Chloroplast Genome of Apple 4 71
81 protein ̄codingꎬ 30 tRNA and four rRNA genes all four rRNA genes are duplicated in the IR regions.
(Table 3). Nine protein ̄codingꎬ seven tRNA and Protein ̄coding genesꎬ tRNAs and rRNAs make up
Fig1 Mapoftheapplecpgenome
ThethicklinesindicatetheextentoftheIRs(IRaandIRb) whichseparatethegenomeintoSSCandLSCregions.Geneslyingoutsidethe
maparetranscribedclockwisewhereasgeneinsidearetranscribedcounterclockwise.Genesbelongingtodifferentfunctionalgroupsarecolor
coded. AreadasheddarkergrayintheinnercircleindicatesGCcontentwhilethelightergraycorrespondstoATcontentofthegenome
Table2 Basecompositionintheapplechloroplastgenome
Genomefeatures Codoncomposition A/% T(U)/% G/% C/% Length/bp
LSC 322 336 166 176 88184
SSC 348 348 145 159 19180
IRa 286 287 206 221 26352
IRb 287 286 221 206 26352
Total 351 321 179 186 160068
CDS 309 314 201 175 79650
1stposition 310 238 267 185 26550
2ndposition 296 325 178 201 26550
3rdposition 322 380 159 139 26550
CDS:CodingDNASequence
植 物 分 类 与 资 源 学 报 第 卷
472 36
479%ꎬ 17% and 54% of the genomeꎬ respective ̄ the 5’ exon is located in the LSC regionꎬ and the 3’
ycf1 rps19
lyꎬ while introns and intergenic spacers constitute the exon is located in the IR regions. The and
remaining 450%. The LSC region contains 61 pro ̄ are located in the boundary regions between IRb/SSC
tein ̄coding genes and 22 tRNA genesꎬ whereas the and IRa/LSCꎬ respectively. Incomplete duplications
ycf1 rps19
SSC region contains 11 protein ̄coding genes and one of the normal copy of and at these bounda ̄
tRNA gene. Eighteen genes in the apple cp genome ries have resulted in a lack of protein ̄coding ability.
clpP rps12 ycf3 psbD psbC ycf1 ndhF
contain intronsꎬ three ( ꎬ and ) of which The  ̄ and  ̄ are two cases of over ̄
trnK ̄UUU
consisted of two introns (Table 4). The lapping genes.
23 Codon usage
has the largest intron (2 516 bp)ꎬ where another
matK rps12
geneꎬ ꎬ is nested within it. For the geneꎬ Based on the sequences of protein ̄coding genes
Table3 Genespresentintheapplechroloplastgenome
Groupofgenes Genenames
psaA psaB psaC psaI psaJ
PhotosystemI ꎬ ꎬ ꎬ ꎬ
psbA psbB psbC psbD psbE psbF psbH psbI psbJ psbK psbL psbM psbN psbT psbZ
PhotosystemII ꎬ ꎬ ꎬ ꎬ ꎬ ꎬ ꎬ ꎬ ꎬ ꎬ ꎬ ꎬ ꎬ ꎬ
Cytochromeb/fcomplex petAꎬpetB∗ꎬpetD∗ꎬpetGꎬpetLꎬpetN
ATP synthase atpAꎬatpBꎬatpEꎬatpF∗ꎬatpHꎬatpI
NADHdehydrogenase ndhA∗ꎬndhB∗(×2)ꎬndhCꎬndhDꎬndhEꎬndhFꎬndhGꎬndhHꎬndhIꎬndhJꎬndhK
rbcL
RubisCOlargesubunit
RNApolymerase rpoAꎬrpoBꎬrpoC1∗ꎬrpoC2
Ribosomalproteins(SSU) rps2ꎬrps3ꎬrps4ꎬrps7(×2)ꎬrps8ꎬrps11ꎬrps12∗∗(×2)ꎬrps14ꎬrps15ꎬrps16∗ꎬrps18ꎬrps19
Ribosomalproteins(LSU) rpl2∗(×2)ꎬrpl14ꎬrpl16∗ꎬrpl20ꎬrpl22ꎬrpl23(×2)ꎬrpl32ꎬrpl33ꎬrpl36
Othergenes clpP∗∗ꎬmatKꎬaccDꎬccsAꎬinfAꎬcemA
Proteinsofunknownfunction ycf1ꎬycf2(×2)ꎬycf3∗∗ꎬycf4ꎬycf15(×2)ꎬycf68(×2)
TransferRNAs 37tRNAs(6containanintronꎬ7intheIRs)
rrn45 rrn5 rrn16 rrn23
RibosomalRNAs (×2)ꎬ (×2)ꎬ (×2)ꎬ (×2)
Oneortwoasterisksaftergenesindicatethatgenecontainsoneortwointronsꎬrespectively
Table4 Thegeneswithintronsintheapplecpgenomeandthelengthoftheexonsandintrons
Gene Location ExonI/bp IntronI/bp ExonII/bp IntronII/bp ExonIII/bp
atpF
LSC 411 733 144
clpP
LSC 228 650 291 824 69
ndhA
SSC 540 1141 552
ndhB
IR 756 670 777
petB
LSC 6 798 642
petD
LSC 9 725 474
rpl16
LSC 399 989 9
rpl2
IR 435 687 390
rpoC1
LSC 1611 742 435
rps12∗ LSC 114 — 26 542 231
rps16
LSC 231 860 42
trnA ̄UGC
IR 38 808 35
trnG ̄GCC
LSC 23 707 37
trnI ̄GAU
IR 42 944 35
trnK ̄UUU
LSC 35 2516 37
trnL ̄UAA
LSC 37 515 50
trnV ̄UAC
LSC 37 593 39
ycf3
LSC 153 745 228 709 126
rps12
The isatrans ̄splicedgenewiththe5’ endlocatedintheLSCregionandtheduplicated3’ endintheIRregions
期 et al
4 JIN Gui ̄Hua .: Characterization of the Complete Chloroplast Genome of Apple 4 73
psbZ trnG ̄UCC
and tRNA genes within the chloroplast genomeꎬ the in the intergenic region between and
relative synonymous codon usage (RSCU) (Sharp within the LSC. Tandem repeatsꎬ accounting for 70%
and Liꎬ 1986) was deduced for the apple genome of total repeatsꎬare the most common among three re ̄
and summarized in Supplementary Table 1. The co ̄ peat types (Fig3: B). Most of the repeats (76%)
don usage of the apple chloroplast genome strongly are distributed within the intergenetic spacer regionsꎬ
reflects the AT bias. Within coding DNA sequence together with 8% in the intronsꎬ 8% in the CDS re ̄
(CDS)ꎬ the percentage of AT content for the firstꎬ gion and 8% in the tRNAꎬ respectively (Fig3: C).
26 SSR analysis
second and third codon positions are548%ꎬ621%
and 702%ꎬ respectively (Table 2). Moreoverꎬ the Chloroplast simple sequence repeats (SSRs) of
81 protein ̄coding genes comprise 79 650 bp coding apple were examined and listed in Supplementary
for 26 550 codons. Among these codonsꎬ 2 781 Table 3ꎬ along with their nucleotide sequences and
(105%) encode leucineꎬ and307 (11%) encode positions within the cp genome. We indentified 237
cysteineꎬ which are the most and least prevalent a ̄ SSR loci (≥8bp) totallyꎬ of which 164 mononucle ̄
mino acidsꎬ respectively. The highest codon usage otideꎬ 68 dinucleotideꎬ four tetranucleotideꎬ and
was observed for ATT or isoleucine (Ile). High co ̄ one hexanucleotide. Among these cpSSR nucleotide
don usage was also observed for Lysine (Lys) and unitsꎬ the longest one is a polyT of 26 bpꎬ and the
Glutamine (Glu) (Supplementary Table 1). Instead majority of mononucleotide repeat units are com ̄
of a common ATG start codonꎬ we identified GTG as posed of A (64) or T (94)ꎬ while only six are com ̄
rps19
start codon for . All of three stop codons are posed of tandem G or C. The majority of repeat units
present with UAA being the most frequently used are ~9 bp long (62 with 8 bpꎬ 39 with 9 bpꎬ 21
(UAA 588%ꎬ UAG 233% and UGA 178%). with 10 bp)ꎬ which are accounted for 5148%
24 Non ̄functional genes
(122/237) of all cpSSRs. CpSSRs are unevenly dis ̄
ycf15
The gene employs GTG as start codon and tributed across the whole genome: 175 in the LSCꎬ
several stop codons were detectedꎬ which indicates 23 in the IRbꎬ and 39 in the SSC regions. Analyses
that it is most likely to be a non ̄functional gene. The of function ̄related location revealed 158 cpSSRs lo ̄
reading frame of this gene contains one insertion cate in intergenic spacer regionsꎬ 38 in intronsꎬ and
‘ACTA’ unitꎬ causing the frameshift and the resul ̄ 41 in CDS of 18 genesꎬ among whichꎬ 17 genes
ting internal stop codons (Fig2: A). On the other were found to harbor at least two SSRs.
ycf68
handꎬ the gene is a truncated pseudogene with
3 Discussion
accumulated stop codons in its reading frameꎬ which
31 Genome Organization
caused by one absence ‘AAAC’ unit and two dele ̄
tion events (total 13 bp) (Fig2: B). We also In generalꎬ the size of photosynthetic land plant
infA
found gene was probably non ̄functional in apple plastid chromosomes ranges from 108 kb to 165 kb
chloroplast genome due to the presence of several (Palmerꎬ 1991ꎻ Raubeson and Jansenꎬ 2005). The
premature stop codons caused by insertion of one cp genome of apple is at the upper boundaryꎬ which
‘TATC’ unit (Fig2: C). is also the largest one among the five available Rosa ̄
25 Repeat analysis
ceae cp genomes. It is about 01 kbꎬ 22 kbꎬ 34
Pyrus pyrifolia Prunus
For repeat structure analysisꎬ we detected six kb and 43 kb larger than ꎬ
persica Pentactina rupicola Fragaria vestica
directꎬ three inverted and 21 tandem repeats in the ꎬ and
vestica
apple cp genome (Supplementary Table 2). Most of subsp. ꎬ respectively. The genome size varia ̄
these repeats exhibit length between 30 and 41 bp tion is mainly caused by differences in the length of
(Fig3: A). The longest repeat of 91 bp is located SSC and IR regions (Table 1).
植 物 分 类 与 资 源 学 报 第 卷
474 36
Fig2 Alignmentofthreepseudogenes
ycf15 Nicotiana Atropa
A. Alignmentofthe geneandproteinsequencesinthetworepresentativespeciesofangiosperms[ (NC_001879) and
(NC_004561)]. Blackasterisksindicatestopcodoninprotein.Redarrowsindicatetheinsertionregion‘TCTA’ inapple.B.Alignmentof
ycf68 Zea Oryza
the geneandproteinsequencesinthetworepresentativespeciesofangiosperms[ (NC_001666) and (NC_001320)].Red
infA
arrowsofboxindicatethe‘AAAC’ unitmissinginapple. C. Alignmentofthe geneandproteinsequencesinthethreerepresentative
Vitis Sasamum
speciesofangiosperms[ (NC_007957) and (NC_016433)]. Redarrowsindicatethe‘AGAT’ unitmissinginapple
期 et al
4 JIN Gui ̄Hua .: Characterization of the Complete Chloroplast Genome of Apple 4 75
Fig3 Repeatstructureanalysisintheapplecpgenome
Thecutoffvaluefortandemrepeatis15bpand30bpfordispersedrepeat. (A) Histogramshowingthenumberofrepeatsinthe
applechloroplastgenome. (B) Compositionofthe30repeats. (C) Locationof30repeats
et al ycf15
The apple cp genome exhibits largely identical (Shi .ꎬ 2013). In appleꎬ the imperfect
gene order and content to most sequenced angio ̄ gene indicates that it is probably a remnant of a
ycf68
sperm cp genomesꎬ emphasizing the highly con ̄ functional gene in one of its predecessors. The
trnI ̄GAU
served nature of these land plant cp genomes sequenceꎬ which occurs in the intronꎬ has
et al
(Wicke .ꎬ 2011). Its GC content is in accor ̄ been proved to be a functional protein encoding gene
Pinus et al
dance with the typical angiosperm cp genomes (Shi ̄ in riceꎬ cornꎬ maize and (Raubeson .ꎬ
et al et al
nozaki .ꎬ 1986ꎻ Kim and Leeꎬ 2004ꎻ Hiratsuka 2007). Howeverꎬ Raubeson . (2007) analyzed
et al et al et al
.ꎬ 1989ꎻ Sato .ꎬ 1999ꎻ Terakami .ꎬ this gene in 14 angiosperms and exhibit multiple
2012). The codon usage bias towards a higher AT frameshifts caused internal stop codons in most ca ̄
representation at the third codon position was also ob ̄ sesꎬ which is proved again in apple. Coding transla ̄
et al infA
served in other land plant cp genomes (Yang .ꎬ tion initiation factor 1ꎬ gene stands out as an
et al
2010ꎻ Nie .ꎬ 2012ꎻ Yi and Kimꎬ 2012ꎻ Tang ̄ unusually unstable angiosperm chloroplast geneꎬ
et al et al
phatsornruang .ꎬ 2010ꎻ Qian .ꎬ 2013). which has been detected to be lost from the chloro ̄
Three genes are non ̄functional in the apple cp plast genome on many separate occasions especially
ycf15 infA ycf68 ycf15
genomeꎬ the ꎬ and . Both and in Eurosids and transferred to the nucleus multiple
ycf68 et al
contain four internal stop codons. These two times (Millen .ꎬ2001). The three eurosids taxa
Eucalyptus Populus Jatropha infA
pseudogenes has been rarely mentioned in previous ( ꎬ and ) contain ꎬ
et al et al
studies (Ravi .ꎬ 2007ꎻ Shi .ꎬ 2013) and however was proved to be pseudogene with multiple
et al
were not annotated in the other four reported Rosace ̄ stop codons (Asif .ꎬ 2010). Our results tell a
ycf15 infA
ae cp genomes. The validity of as a protein ̄ same story of in apple. Why these three genes
et
coding gene has long been questioned (Chumley degenerated in some land plant cp genome deserve
al et al
.ꎬ 2006ꎻ Steaneꎬ2005). Thoughꎬ Shi . (2013) further study.
ycf15
have suggested the gene was transcribed as Most repeats are located in the intergenic spac ̄
precursor polycistronic transcript which contained ers and intronsꎬ but several occur in tRNA genes
ycf2 ycf15 trnL ̄CAA Camellia
ꎬ and antisense in the and CDS. Short dispersed repeats are considered to
transcriptome. This gene is disabled in some of angi ̄ be one of the major factors promoting cp genome re ̄
Amborella et al
osperms such as (Goremykin .ꎬ combination and rearrangement because they are
Nuphar et al
2003) and (Raubeson .ꎬ 2007)ꎬ mono ̄ common in highly rearranged algal and angiosperm
cotsꎬ most rosidsꎬ and some other separate lineages genomesꎬ and many rearrangement endpoints are as ̄
植 物 分 类 与 资 源 学 报 第 卷
476 36
et al et et al
sociated with such repeats (Lee .ꎬ2007ꎻ Yue 2007ꎻ Verma .ꎬ 2008). Foreign gene integra ̄
al et al et al
.ꎬ 2007ꎻ Haberle .ꎬ 2008ꎻ Pombert .ꎬ tion in to the chloroplast genome occurs via homolo ̄
et al
2005ꎻ Chumley .ꎬ 2006). In the un ̄rearranged gous recombination of flanking sequences used in
cp genomeꎬ most of the repeats are located mostly in chloroplast vectors (Verma and Daniellꎬ 2007).
intergenic spacer regions and intronsꎬ although sev ̄ Chloroplast transformation has made significant pro ̄
psaA
eral are located in the protein ̄coding genes of ꎬ gress in the model species tobacco as well as in a
psaB ycf2 et al et al
and (Daniell .ꎬ 2006ꎻ Timme .ꎬ few major cropsꎬ such as potatoꎬ tomato and cotton
et al et al
2007ꎻ Saski .ꎬ 2005). Repeat analysis of apple (Verma .ꎬ 2008ꎻ Verma and Daniellꎬ 2007).
trnI trnA accD rbcL
cp genome was carried out for the five available Ro ̄ Although the  ̄ and  ̄ intergenic
saceae cp genomes for the first timeꎬ which will pro ̄ spacer regions have been widely used as gene intro ̄
et al
vide more informative sources for developing markers duction sites for vector construction (Verma .ꎬ
for its population and phylogeny studies. 2008)ꎬ the transformation efficiency is impaired
In our studyꎬ we detected 237 SSRs with une ̄ when the sequences for homologous recombination
ven distribution in the apple cp genome. Most of the are divergent among distantly related species (Ruhl ̄
et al
SSRs were found in the nocoding regionsꎬ which is man .ꎬ 2006). Howeverꎬ spacer regions are not
not unusual as a result of the higher number of muta ̄ 100% identical even in members of the same family.
tions within these regions compared with more con ̄ Comparison of intergenic spacer regions among mem ̄
served coding regions (Ebert and Peakallꎬ 2009). bers of Solanaceae revealed that only four regions are
et al
Additionallyꎬ there was a significantly larger number identical (Daniell .ꎬ 2006). Similarlyꎬ compar ̄
of A and T microsatellites than G and Cꎬ which has ison of intergenic spacer regions of nine grass cp ge ̄
et
been reported previously in other taxa (Kuang nomes revealed that not even a single spacer region
al et al et al
.ꎬ 2011ꎻ Qian .ꎬ 2013ꎻ Raubeson .ꎬ is identical among all sequenced cp genomes (Saski
et al et al
2007). SSR is another repeat type which is based on .ꎬ 2007). Terakami . (2012) investigated
simpler motif and shorter than aforementioned re ̄ several deletions and insertions in the intergenic
Pyrus Malus Prunus
peats. SSRs have been used to obtain high resolution spacer regions amongst the ꎬ and
ndhC trnV trnR atpA rpl33
in some closely related plant taxaꎬ proving to be ef ̄ cp genomesꎬ such as  ̄ ꎬ  ̄ ꎬ  ̄
rps18 psbI trn accD psaI
fective genetic markers to study plant breedingꎬ pop ̄ ꎬ  ̄ S and  ̄ . There are no inter ̄
ulation geneticsꎬ biological conservationꎬ mating genic spacer regions with 100% identity in the Rosa ̄
et al
systemsꎬ and uniparental lineages (Terrab .ꎬ ceae available cp genome. The availability of the
et al et al
2006ꎻ Cardle .ꎬ 2000ꎻ Peakall .ꎬ 1998). complete cp genome sequence of apple is helpful to
By analyzing the complete chloroplast genome of ap ̄ identify the optimal intergenic spacers for transgene
pleꎬ we hope to facilitate future studies by selecting integration and to develop site ̄specific cp transfor ̄
target regions for more in ̄depth population studies mation vectors. Using cp genetic engineering to in ̄
within the genus. troduce useful traitsꎬ such as pests resistance and
32 Implications for Chloroplast Genetic Engi ̄
drought toleranceꎬ might be other applications to im ̄
neering
prove this economic plant.
Chloroplast genetic engineering is exemplary for
References
its unique advantages including the possibility of :
etal
AsifMHꎬMantriSSꎬSharmaA .ꎬ2010. Completesequenceand
multi ̄gene engineering in a single transformation e ̄
Jatropha curcas
organisation of the (Euphorbiaceae) chloroplast
ventꎬ transgene containment due to maternal inheri ̄ TreeGenetics&Genomes 6
genome [J]. ꎬ :941—952
tanceꎬ high levels of transgene expression and lack
BensonGꎬ1999. Tandemrepeatsfinder:Aprogram to analyze DNA
NucleicAcidsResearch 27
of gene silencing (Daniellꎬ2007ꎻ Verma and Daniellꎬ sequences [J]. ꎬ :573—580
期 et al
4 JIN Gui ̄Hua .: Characterization of the Complete Chloroplast Genome of Apple 4 77
et al
CardleLꎬRamsay Lꎬ Milbourne D .ꎬ 2000. Computational and HuotariTꎬ Korpelainen Hꎬ 2012. Complete chloroplast genome se ̄
Elodeacanadensis
experimental characterization of physically clustered simple se ̄ quenceof andcomparativeanalyseswith other
Genetics 156 Gene 508
quencerepeatsinplants [J]. ꎬ :847—854 monocotplastidgenomes [J]. ꎬ :96—105
etal et al
ChumleyTWꎬPalmerJDꎬMowerJP .ꎬ2006.Thecompletechlo ̄ JansenRKꎬCaiZꎬRaubeson LA .ꎬ2007. Analysis of81 genes
Pelargonium hortorum
roplastgenome sequence of × : Organiza ̄ from64plastidgenomesresolvesrelationshipsinangiospermsand
Proceedings of
tionandevolutionofthelargestandmosthighlyrearrangedchlo ̄ identifiesgenome ̄scaleevolutionary patterns [J].
MolecularBiologyand Evo ̄ theNationalAcademyofSciencesoftheUnited StatesofAmerica
roplastgenomeoflandplants [J]. ꎬ
lution 23 104
ꎬ :2175—2190 :19369—19374
etal
DaniellHꎬ2007.Transgenecontainmentbymaternalinheritance:Ef ̄ Jansen RKꎬSaskiCꎬLeeSB .ꎬ2011. Completeplastid genome
Proceedings of the National Academy of Castanea Prunus Theobroma
fectiveorelusive?[J]. sequencesofthreerosids( ꎬ ꎬ ):Evi ̄
SciencesoftheUnitedStatesofAmerica 104 rpl22
ꎬ :6879—6880 denceforatleasttwoindependenttransfersof tothenucleus
etal MolecularBiologyandEvolution 28
DaniellHꎬLeeSBꎬGrevichJ .ꎬ2006. Completechloroplastge ̄ [J]. ꎬ :835—847
Solanum bulbocastanum Solanum lycopersi ̄
nome sequences of ꎬ KimKJꎬLeeHLꎬ2004.Completechloroplastgenomesequencesfrom
cum Panaxschinseng
and comparative analyses with other Solanaceae genomes Koreanginseng( Nees) and comparativeanaly ̄
TheoreticalandAppliedGenetics 112 DNA
[J]. ꎬ :1503—1518 sisof sequence evolution among 17 vascular plants [J].
Research 11
Downie SRꎬ Palmer JDꎬ 1992. Use of chloroplast DNA rearrange ̄ ꎬ :247—261
etal
ments in reconstructing plant phylogeny [A]. In: Soltis PSꎬ KuangDYꎬWuHꎬWangYL .ꎬ2011. Completechloroplastge ̄
Molecular Systematics of Plants Magnolia kwangsiensis
Soltis DEꎬ Doyle JJ (eds.)ꎬ nomesequence of (Magnoliaceae): Im ̄
Ge ̄
[M]. NewYork:ChapmanandHallꎬ14—35 plicationfor DNA barcoding and population genetics [J].
nome 54
Ebert Dꎬ Peakall Rꎬ 2009. Chloroplast simple sequence repeats ꎬ :663—673
et al
(cpSSRs):Technicalresourcesand recommendationsforexpan ̄ KurtzSꎬ Choudhuri JVꎬ Ohlebusch E .ꎬ 2001. REPuter: The
dingcpSSR discovery and applications to a wide array of plant manifoldapplicationsofrepeatanalysis on a genomic scale [J].
MolecularEcologyResources 9 NucleicAcidsResearch 29
species [J]. ꎬ :673—690 ꎬ :4633—4642
etal
GivnishTJꎬAmesMꎬMcNealJR .ꎬ2010.Assemblingthetreeof LeeCꎬHongSPꎬ2011. PhylogeneticrelationshipsoftherareKorean
Pentactinanakai
themonocotyledons:Plastome sequence phylogeny and evolution monotypicendemicgenus inthetribeSpiraeeae
Annals of the Missouri Botanical Garden 97 PlantSystematicsand
ofPoales [J]. ꎬ : (Rosaceae) basedonmoleculardata [J].
Evolution 294
584—616 ꎬ :159—166
et al et al
Goremykin VVꎬ Hirsch ̄Ernst KIꎬ Wölfl S .ꎬ 2003. Analysis of LeeHLꎬ Jansen RKꎬ Chumley TW .ꎬ 2007. Gene relocations
Amborella trichopoda Jasminum Menodora
the chloroplast genome sequence suggests withinchloroplastgenomes of and (Oleace ̄
Amborella MolecularBiology Molecular
that isnotabasalangiosperm [J]. ae) are due to multipleꎬ overlapping inversions [J].
andEvolution 20 BiologyandEvolution 24
ꎬ :1499—1505 ꎬ :1161—1180
Trendsin etal
GrayMWꎬ1989.Theevolutionaryoriginsoforganelles [J]. LiuYꎬHuoNXꎬDongLL .ꎬ2013.Completechloroplastgenome
Genetics 5 Artemisiafrigida
ꎬ :294—299 sequencesofMongoliamedicine andphylogenetic
etal PLOSONE 8
GuisingerMMꎬChumleyTWꎬKuehlJV .ꎬ2010. Implicationsof relationshipswithotherplants [J]. ꎬ :e57533
Typha
theplastid genome sequence of (Typhaceaeꎬ Poales) for Lohse Mꎬ Drechsel Oꎬ Bock Rꎬ 2007. OrganellarGenomeDRAW
Journalof Mo ̄
understandinggenomeevolutioninPoaceae [J]. (OGDRAW):Atoolfortheeasygenerationofhigh ̄qualitycus ̄
lecularEvolution 70
ꎬ :149—166 tomgraphicalmaps of plastid and mitochondrial genomes [J].
et al CurrentGenetics 52
Haberle RCꎬ Fourcade HMꎬ Boore JL .ꎬ 2008. Extensive rear ̄ ꎬ :267—274
Trachelium caeruleum et al
rangements in the chloroplast genome of MartinGꎬBaurens FCꎬ Cardi C .ꎬ 2013. The complete chloro ̄
JournalofMo ̄ Musaacuminata
areassociatedwithrepeatsandtRNAgenes [J]. plastgenomeofbanana( ꎬZingiberales):Insight
lecularEvolution 66 PLOSONE 8
ꎬ :350—361 intoplastidmonocotyledonevolution [J]. ꎬ :e67350
et al etal
HiratsukaJꎬShimadaHꎬ Whittier R .ꎬ1989. The complete se ̄ MillenRSꎬOlmsteadRGꎬAdamsKL .ꎬ2001.Manyparallellos ̄
Oryza sativa infA
quenceofthe rice ( ) chloroplast genome: Intermo ̄ sesof fromchloroplastDNAduringangiospermevolutionwith
The Plant
lecularrecombinationbetweendistincttRNAgenesaccountsfora multipleindependent transfers to the nucleus [J].
Cell 13
majorplastid DNA inversion during the evolution of the cereals ꎬ :645—658
MolecularandGeneralGenetics 217 etal
[J]. ꎬ :185—194 NieXJꎬLvSZꎬZhangYX .ꎬ2012.Completechloroplastgenome
et al Ageratina
HoweCJꎬ Barbrook ACꎬ Koumandou VL .ꎬ 2003. Evolution of sequenceof a major invasive speciesꎬ crofton weed (
Philosophical Transactions of the adenophora PLOSONE 7
the chloroplast genome [J]. ) [J]. ꎬ :e36869
RoyalSocietyofLondonSeriesB ̄BiologicalSciences 358
ꎬ :99— Palmer JDꎬ 1991. Plastid chromosomes: Structure and evolution
CellCultureandSomatic
106 [A].In:BogoradLꎬVasilIK(eds.)ꎬ