ebook img

Identification and characterization of short tandem repeats in the Tibetan macaque genome based on resequencing data PDF

2018·26.1 MB·
by  LiuSan-Xu
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Identification and characterization of short tandem repeats in the Tibetan macaque genome based on resequencing data

ZOOLOGICAL RESEARCH Identification and characterization of short tandem repeats in the Tibetan macaque genome based on resequencing data San-XuLiu1,WeiHou1,Xue-YanZhang1,Chang-JunPeng1,Bi-SongYue1,2,Zhen-XinFan1,JingLi1,2,* 1KeyLaboratoryofBio-ResourceandEco-EnvironmentofMinistryofEducation,CollegeofLifeSciences,SichuanUniversity,Chengdu Sichuan610065,China 2SichuanKeyLaboratoryofConservationBiologyonEndangeredWildlife,CollegeofLifeSciences,SichuanUniversity,ChengduSichuan 610065,China ABSTRACT in STR allele size between the two populations. The polymorphic STR loci identified based on The Tibetan macaque, which is endemic to China, the reference genome showed good amplification is currently listed as a Near Endangered primate efficiency and could be used to study population species by the International Union for Conservation geneticsinTibetanmacaques. Theneighbor-joining of Nature (IUCN)(2017). Short tandem repeats tree classified the five macaques into two different (STRs) refer to repetitive elements of genome branches according to their geographical origin, sequence that range in length from 1–6 bp. They indicating high genetic differentiation between the are found in many organisms and are widely HuangshanandSichuanpopulations. Weelucidated applied in population genetic studies. To clarify the thedistributioncharacteristicsofSTRsintheTibetan distributioncharacteristicsofgenome-wideSTRsand macaquegenomeandprovidedaneffectivemethod understandtheirvariationamongTibetanmacaques, for screening polymorphic STRs. Our results also we conducted a genome-wide survey of STRs with layafoundationforfuturegeneticvariationstudiesof next-generationsequencingoffivemacaquesamples. macaques. Atotalof1077790perfectSTRswereminedfromour Keywords: Tibetan macaque (Macaca thibetana) assembly,withanN50of4966bp. Mono-nucleotide genome; Short tandem repeats; Variation analysis; repeats were the most abundant, followed by tetra- Polymorphism;Next-generationsequencing and di-nucleotide repeats. Analysis of GC content INTRODUCTION and repeats showed consistent results with other macaques. Furthermore,usingSTRanalysissoftware Short tandem repeats (STRs), also known as microsatellites, (lobSTR), we found that the proportion of base pair are highly variable repetitive elements with nucleotide motifs deletions in the STRs was greater than that of insertions in the five Tibetan macaque individuals (P<0.05, t-test). We also found a greater number of Received: 28November2017; Accepted: 04April2018; Online: 11 homozygousSTRsthanheterozygousSTRs(P<0.05, April2018 t-test),withtheEmeiandJianyangTibetanmacaques Foundation items: This research was supported by the State Key showing more heterozygous loci than Huangshan ProgramofNationalNaturalScienceFoundationofChina(31530068), Tibetanmacaques. Theproportionofinsertionsand National Natural Science Foundation of China (31770415), and mean variation of alleles in the Emei and Jianyang SichuanApplicationFoundationProject(2015JY0268). individuals were slightly higher than those in the *Correspondingauthor,E-mail:[email protected] Huangshan individuals, thus revealing differences DOI:10.24272/j.issn.2095-8137.2018.047 SciencePress ZoologicalResearch39(4):291–300,2018 291 of 1–6 bp and are ubiquitous in most eukaryotic organisms remainunidentifiedandthereisnoinformationavailableonthe (Oliveira et al., 2006). Microsatellite mutations are generally comparativeanalysisofwholegenomeSTRsduetothelackof derived from replication slippage, leading to the insertion or genomicdata. deletion of one or several repeat motifs (Levinson & Gutman, Whole genome profiles of STR variations in non-model 1987; Schlötterer, 2000; Schlötterer & Tautz, 1992). STRs organismshavenotyetbeenconducted,despitethediscovery located in both the coding and non-coding areas of DNA can of STRs in the 1980s (Guichoux et al., 2011). This is due affect gene expression and transcription (Chistiakov et al., to the associated expense as well as the inefficient and 2006; Li et al., 2004), especially STRs in the coding regions. laborious construction of genomic libraries and enrichment of Forexample,certainhumanneurologicaldisordersarecaused clonescontainingSTRmotifs(Castagnone-Serenoetal.,2010; bytrinucleotiderepeatexpansionsincodingregions(LaSpada Powell et al., 1996). Recently, the advent of high-throughput et al., 1991; MacDonald et al., 1993). Previous studies sequencing technology has produced considerable genomic have also indicated that the mutation processes of STR loci data, providing the opportunity for genome-wide analysis of can vary, even among closely related species, resulting in STR variations. To date, STR marker development and interspecific differences in STR allele size (Rubinsztein et al., polymorphismanalysesbasedonhigh-throughputsequencing 1995; Schlötterer, 1998; Webster et al., 2002). STRs are data have been performed in humans (Willems et al., 2014), frequently applied to genetic mapping, forensic identification, cattle(Xuetal.,2016),maize(Qu&Liu,2013),pig(Liuetal., and phylogenetic studies as molecular markers due to their 2017a),fababean(Abuzayedetal.,2017),andAngelicagigas high abundance, neutrality, high variability, and codominance Nakai (Giletal.,2017). However,thedistributionandvariation (Schlötterer,1998). Recentstudieshavealsoemphasizedthe ofgenome-wideSTRsintheTibetanmacaqueremainlargely roleofSTRsinthegeneticarchitectureofquantitativehuman unknown. To identify and characterize all potential STR loci traits(Gymreketal.,2016). Inaddition,STRshavebeenused and clarify the differences in STRs among Tibetan macaque as molecular markers for the study of genetic diversity and individuals, we systematically profiled STR distribution in the populationgeneticstructure(Qinetal.,2016;Sunnucks,2000). TibetanmacaquegenomeandSTRvariationsbetweenEmei, Here, we determined genome-wide STR variation in Tibetan Jianyang,andHuangshanindividualsbasedonwholegenome macaquepopulationsbasedonhigh-throughputsequencing. resequencingdata.Theseprofilingresultswillcontributetoour TheTibetanmacaque(Macacathibetana)isoneof23extant understandingofTibetanmacaquespeciesvariability. species of the genus Macaca (Primates: Cercopithecidae) MATERIALSANDMETHODS and is endemic to China. Its geographic distribution extends from southwest to east regions in China (Jiang et al., Genomicsamplesandhigh-throughputsequencing 1996). Jiang et al. (1996) divided M. thibetana into four Five Tibetan macaques, two obtained from Huangshan subspecies according to variations in external morphology, Mountain(AnhuiProvince,HS)andthreecapturedfromEmei hair color, cranial features, and geographical distribution: Mountain (one individual) and Jianyang city (two individuals) namely,Macacathibetanathibetana,Macacathibetanapullus, in Sichuan Province (SC), China, were used for whole Macaca thibetana huangshanensis, and Macaca thibetana genome resequencing. Genomic DNA was isolated from guizhouensis. The species is an increasingly studied animal blood and muscle tissue samples (Supplementary Table S1). modelinbiomedicalresearchduetoitseaseofdomestication, All procedures were approved by the local Ethics Review largebodysize,longlifespan,andphysiologicalcharacteristics Board and were in accordance with all relevant national and analogous to those of humans (Yi et al., 2012). Studies on international regulations. We generated 150-bp paired-end intraocularpressure(Liuetal.,2011)andlivertransplantation reads with insertion sizes of approximately 350 bp on the (Jietal.,2015)aretwogoodapplicationexamples. However, Illumina HiSeq X Ten platform. In total, 93.68–104.19 Gb of habitatdestruction,deforestation,andillegalpoachinghaveled clean data were obtained after filtering out low-quality reads to considerable fragmentation of natural Macaca populations, using the NGS QC toolkit with a quality score of 20 and witharesultingdecreaseinwildTibetanmacaquepopulations percentageofreadlengthlessthan75%ofgivenquality(Patel observed in recent years (Jiang et al., 2016). Currently, the & Jain, 2012) at a sequencing depth of over 30× in the five Tibetan macaque is a vulnerable and endangered species samples, respectively. The resequencing genome data were (Jiang et al., 2016), and is classified as a Category II depositedintheGSAdataset(http://bigd.big.ac.cn/gsa/)under speciesundertheChineseWildAnimalProtectionLaw. Thus, accessionnumberCRA000789. studies on its genetic variability and diversity are urgently GenomeassemblyandcharacterizationofSTRs neededtoassessgeneticdifferentiationanddevelopeffective The high-quality clean reads of one Emei Mountain macaque conservation strategies. To date, previous studies have were assembled using the short reads assembling program recognized56STRmarkersinTibetanmacaquesbygenomic ABySS2.0(Simpsonetal.,2009). Theresultingshortcontigs libraryconstructionandcross-speciesamplification(Jiaetal., (<150bases)wereexcludedfromtheassembly.Wethenused 2011, 2012; Li et al., 2014; Yang et al., 2017). Nevertheless, MSDB v2.4 (Du et al., 2013) to search STRs using the same the numbers of identified STR markers are insufficient for settings and parameters as described previously (Liu et al., accurateestimation,conservation,andmanagementofTibetan 2017b). macaques. Most STRs in the Tibetan macaque genome 292 www.zoores.ac.cn GenotypingofSTRsbasedonresequencingdata loci were screened using VCFtools by the application of GenotypingoftheSTRswasperformedusinglobSTRsoftware “--min-alleles 4” and “--maf 0.1” parameters. Ten loci from (version4.0.6),arapidandaccuratealgorithmforSTRprofiling theseSTRswererandomlyselectedtovalidateefficiencyand based on whole-genome sequencing data (Gymrek et al., polymorphisminthe16Tibetanmacaquesamplesbyagarose 2012).Thealgorithmcanavoidgappedalignmentandaddress gel electrophoresis and capillary electrophoresis. Cervus specificnoisepatternsinSTRcalling. Inthisstudy,alobSTR software was employed to calculate the observed number reference was first constructed based on rhesus macaque of alleles (Na), mean observed heterozygosity (Ho), mean (GCF_000772875.2)STRdata(generatedinLiuetal.,2017b) expected heterozygosity (He), and polymorphic information usinglobstr_index.pyscript. Theresequencingdatawerethen content(PIC). alignedtotherhesusmacaquereferencegenomewithdefault parameters. TheoutputBAMfilesweresortedwithSAMtools. RESULTS Finally, we analyzed STR allelotypes based on the alignment filesofthefivesamples. Intotal, weobtained840356STRs Sequencing, assembly, and STR overview of the Tibetan acrossthefivemacaqueindividualsafterremovinglow-quality macaquegenome STRs(QUAL<30),whichwereusedfordownstreamanalyses. We resequenced the genomes of five Tibetan macaques StatisticalanalyseswereperformedusingSPSSversion19. from Sichuan and Huangshan populations using Illumina sequencing technology, generating over 30-fold sequencing GeneticrelatednessanalysisofSTRs depth for each individual. After data filtering, we obtained TetranucleotideSTRlocifromallfivesampleswereemployed a total of 312 252 249 clean sequencing reads with 150-bp for genetic analysis. Each STR allele different from the and350-bpinsertfragments,correspondingto93.68–95.45G reference allele size was coded as a binary string of all base pairs with GC content of approximately 45% and over zeros, and the remaining alleles were set to one. Genetic 94.06%Q20bases(basequalitymorethan20)ineachsample distance was calculated using the vegan package in the R (Supplementary Table S1). In the assembled genome, we program(version3.0.2)(Team,2000). Finally,weconstructed acquired a total of 1 223 752 contigs with a total length of a phylogenetic tree based on the neighbor-joining method in 2.66 Gb and N50 of 4 966 bp. The average GC content MEGA5.2(Hall,2013). of the genomic sequences was 40.48%. The maximum and ScreeningofpolymorphicSTRs averagelengthsofthecontigswere97085bpand2172.38bp, Allele numbers of each STR loci were counted based on respectively (Table 1). The length distributions of the contigs genotypingdata,andhigh-quality,polymorphictetranucleotide canbeseeninSupplementaryFigureS1. Table1AssemblyresultsfortheM.thibetanagenome Item Number Totalnumberofsequencesexamined(n) 1223752 Totalsizeofexaminedsequences(bp) 2658459556 Meanlengthofexaminedsequences(bp) 2172.38 N50 4966 N90 1108 GCcontent(%) 40.48% NumberofsequencescontainingSTRs(n) 521523 NumberofsequencescontainingmorethanoneSTR(n) 252041 Hereafter, we analyzed the distribution of STRs in the loci/Mbwerethemostabundanttypes,accountingfor57.89% assembly using MSDB2.4 software. All 1 223 752 contigs of the total number of STRs, followed by tetra- (181 344 loci, generated in this study were used to identify potential STRs 16.83%)anddi-nucleotide(165769loci,15.38%)repeats,with withminimumrepeatsof12,7,5,4,4,and4forallmotifsfrom hexanucleotide(6347loci,0.59%)repeatstheleastabundant mono- to hexa-nucleotide STRs, respectively. As expected, (Table2).AveragelengthoftheSTRcoresequencesincreased perfect STRs were the most abundant category, followed withrepeatmotifsfrommono-tohexa-nucleotideSTRs,except by compound STRs, interrupted compound and interrupted for trinucleotide repeats, with the total average length found perfectSTRs,andcomplexSTRs(SupplementaryTableS2).A to be 19.19 bp. Analysis of GC content for the six STR totalof1077790perfectSTRs(accountingforapproximately types showed that GC content was highest in dinucleotide 0.78% of the assembly) were mined in 521523 contigs, with STRs, accounting for 48.75% of dinucleotide STRs in the 252 041 sequences containing more than one STR (Table 1). assembly, followed by tri-, hexa- and tetra-nucleotide repeats. Thefrequency,averagelength,andGCcontentofthepotential ThelowestGC-contentwasfoundinthemononucleotideSTRs, STRs were also analyzed. Among the 1 077 790 STRs, accountingfor0.48%inthegenome(Table2). mononucleotide repeats with the highest frequency of 234.7 ZoologicalResearch39(4):291–300,2018 293 Table2Number,length,frequency,density,andGCcontentofperfectSTRs Total Average Frequency Density GC GC Types TotalCounts Length(bp) Length(bp) (loci/Mb) (bp/Mb) Length Content(%) Mono- 623930 10397456 16.66 234.7 3911.083 50200 0.48 Di- 165769 3646216 22 62.36 1371.552 1777603 48.75 Tri- 64529 1216338 18.85 24.27 457.535 436005 35.85 Tetra- 181344 4372964 24.11 68.21 1644.924 1407970 32.20 Penta- 35871 873535 24.35 13.49 328.587 230585 26.40 Hexa- 6347 172866 27.24 2.39 65.025 60271 34.87 2067937 Total 1077790 19.19 405.42 7778.706 3962634 19.16 (0.78%) WealsofoundthatmostrepeatmotifswereAT-rich,except forthedinucleotiderepeats.Withafrequencyof233.81loci/Mb, the A/T mononucleotide repeat was the most frequent motif in the STRs, followed by the AC/GT (42.6 loci/Mb), AG/CT (17.09 loci/Mb), AAAC/GTTT (13.53 loci/Mb), ATTT/AAAT (11.76 loci/Mb), AAGG/CCTT (11.7 loci/Mb), AAAG/CTTT (11.35loci/Mb),AAC/GTT(8.47loci/Mb),GTTTT/AAAAC(6.22 loci/Mb),andATT/AAT(5.28loci/Mb)motifs. Thefrequencyof theremainingmotifswas36.52loci/Mb(Figure1).Thenumber ofmajorrepeatsrangedfrom12to40formono-,7to27fordi-, 5to20fortri-,4to15fortetra-,4to12forpenta-,and4to8 forhexa-nucleotiderepeat,respectively(Figure2). Theseloci accountedfor97.89%ofthetotalcountsofeachSTRtype. Figure1DistributionofSTRmotifsinM.thibetana AlignmentandgenotypingofSTRs To explore differences in STR distribution among the Tibetan macaques, we used lobSTR software to examine the high-throughputsequencingdataofthefiveindividualsfromthe Huangshan and Sichuan populations (Supplementary Table S1). A total of 1 060 419 STRs were found, with the 1 294 455 perfect STRs located in the rhesus macaque chromosomesusedasareference. Afterremovinglow-quality STRs(QUAL<30),weachievedatotalof840356STRsacross the five macaques. On average, we obtained genotypes for 713 685.8 STR loci, with a mean coverage of 5.94-fold per individual, of which 354 834.2 STR loci (49.72%) had 5-fold greatercallcoverage(Figure3A).Furtherobservationshowed that 526 699 STR loci (62.68%) of the 840 356 STRs called in our study were shared by the five macaques and 35 352 STRloci(4.20%)wereuniquetooneindividual.Inaddition,63 124, 87 390, and 127 791 STRs were identified in two, three, Repeat type and four individuals, respectively (Figure 3A). The STR loci Figure2DistributionofSTRtypesintheM.thibetanagenome showed greatest alignment to chr 1, followed by chr 2, chr3, byrepeattime chr 7, and chr 5, and finally chr Y (Figure 3B). Comparison ofallelenumberofdi-andtri-nucleotideSTRmotifsindicated ScreeningandinitialvalidationofpolymorphicSTRs that AT repeats had the highest proportion in dinucleotide Analysis of the allele number of STRs among the five STR motifs with greater than one allele, followed by AC, AG, individuals indicated that loci with only one allele were and CG repeats. For trinucleotide STR motifs with more underrepresented in all individuals, accounting for 1.41% than one allele, AAG, AAT, and AAC were found at higher (11869loci) of all loci. Among these loci, trinucleotide STRs proportions.Conversely,forSTRmotifswithonlyoneallele(no had the highest proportion, accounting for 6.03% (1 726 loci) polymorphism), the proportions of CG and CCG were higher ofthetotalnumberoftrinucleotideSTRs;andmononucleotide thanthoseoftheothermotiftypes(Figure3C). STRswererelativelyunderrepresented, accountingfor0.54% 294 www.zoores.ac.cn (3205loci)ofmononucleotideSTRs. STRlociwithtwoalleles highest proportion in mono- (245 673 loci, 41.56%) and accounted for the highest proportion in di- to hexa-nucleotide di-nucleotideloci(52180loci, 40.42%), followedbytetra-(13 STRs. The proportion of STRs with two alleles (22.93%, 307loci,18.84%),penta-(2560loci,14.50%),hexa-(456loci, 31.51%, 55.88%, 52.12%, 58.09%, and 58.15% from mono- 13.85%),andtri-nucleotideSTRs(3771loci,13.17%)(Figure to hexa-nucleotide STRs, respectively) increased with motif 3D).pSTRsaccountedfor37.83%ofthetotalSTRs. length. Loci with more than four alleles (pSTRs) had the A 6877 J021 3646 A011 5912 6064 3932 7269 J031 8550 8586 19717 18630 4967 5712 18520 7052 6704 526699 36550 22431 3090 8178 4870 7969 11376 11169 4824 30463 8960 6333 11129 8744 5433 HT-2 HT-1 732670 732670 719498 700233 688109 72919 366335 A011 J031 HT-2 J021 HT-1 526699 127791 87390 63124 5 4 3 2 1 (35352) Figure3AlignmentresultsofSTRs A:NumberofsharedandspecificSTRsamongthefiveM.thibetanasamples.B:DistributionofSTRsmappedtoreferencechromosomes.C:Comparisonof allelenumberfordi-andtri-nucleotideSTRmotifs.D:ComparisonofallelenumberforthesixtypesofSTRs. We also characterized 3 325 tetranucleotide STRs that fluorescence (FAM/HEX). Finally, four to seven alleles were showed at least four alleles with a frequency of over 0.1. detected across each locus of the six loci. The Ho and The coordinate information of these loci can be seen in He values ranged from 0.40 to 0.88 (mean=0.55) and 0.58 Supplementary Table S3. Ten loci from these STRs were to 0.81 (mean=0.69), respectively, and the PIC ranged from randomly selected to validate efficiency and polymorphism. 0.519to0.754(mean=0.625)foreachlocus(Table3). Primer Among them, the forward primers of six STR loci that were information of the six markers is shown in Supplementary amplified successfully in the 16 samples were labeled by TableS4. ZoologicalResearch39(4):291–300,2018 295 Table3Number,length,frequency,density,andGCcontentof To measure genetic relatedness, 41 166 tetranucleotide perfectSTRs STR loci called in all five macaques were used for relatedness analysis. The phylogenetic tree constructed Locus Na N Ho He PIC by neighbor-joining based on genotyping data of the JR05 4 16 0.875 0.659 0.590 tetranucleotidelocishowedthatthethreeindividualsfromSC were clustered onto one branch and the two individuals from JR09 7 16 0.750 0.808 0.754 HS were clustered onto another branch. On the SC branch, JR10 6 15 0.400 0.577 0.524 thetwoindividualsfromJianyangandoneindividualfromEM JR12 6 15 0.400 0.811 0.751 showedlittlegeneticdifferentiation(SupplementaryFigureS2). JR18 5 15 0.467 0.602 0.519 Thisresultwasconsistentwiththeirgeographicaldistribution. JR20 4 15 0.400 0.701 0.613 Na, observed number of alleles; N, sample number; Ho, mean observedheterozygosity;He,meanexpectedheterozygosity;PIC, polymorphicinformationcontent.. Variation and genetic relatedness analysis based on genotypingdataofSTRs To obtain more accurate and reliable results, we focused on the 526 699 loci called in all five macaques, including 354577mono-, 96370di-, 20271tri-, 41166tetra-, 12042 penta-, and 2 273 hexa-nucleotide loci. We divided these loci into four allelotype categories: homozygous reference wherebothallelesarealignedtothereference; heterozygous reference where one allele is aligned to the reference; homozygousnonreferencewherebothallelesarenotmapped Figure4Comparisonofthefourallelotypecategoriesfor to the reference but are the same; and heterozygous eachrepeattypeamongthefiveTibetanmacaques nonreference where both alleles are different and do not (Blue)homozygousreference;(red)heterozygousreference;(green) match the reference. For mononucleotide repeats, as shown in Figure 4 and Supplementary Table S5, the proportion homozygousnonreference;(purple)heterozygousnonreference.Columns of homozygous reference/nonreference loci was lower in representtheproportionofthefourallelotypecategoriesofA011,J021, SC than in HS, whereas, the proportion of heterozygous J031,HT-1,andHT-2fromlefttorightforeachrepeattype,respectively. reference/nonreference was higher in SC than in HS. The distributional pattern of the four allelotype categories in the DISCUSSION other repeat types was similar to that in the mononucleotide repeats(Figure4andSupplementaryTableS5).Theseresults GenomeassemblyandSTRdistributionofM.thibetana showedthatTibetanmacaquesinHShadahigherproportion Recently,withthedevelopmentofnext-generationsequencing of homozygous reference and nonreference loci than that in technology and advancement in bioinformatics analysis, SCforeachrepeattype. Conversely,SCindividualshadmore considerable research on genetic variation in primates has heterozygous reference/nonreference loci (Figure 4). Overall, beenconductedbasedongenome-widesequencing,including there was a far higher number of homozygous STRs than on humans (1000 Genomes Project Consortium et al., 2015), heterozygous STRs in all five Tibetan macaques, and the rhesus macaques (Liu et al., 2017b; Xue et al., 2016; Zhong difference was significant (P<0.05, t-test). The percentage et al., 2016), and cynomolgus macaques (Osada et al., of homozygous loci was positively correlated with the length 2015). To date, however, few genome-scale studies have of the repeat motif (Figure 4). Compared with the reference been reported in Tibetan macaques, except for the previous alleles, the proportion of base pair deletions in the STRs identification of 11.9 million single nucleotide variants (SNVs) was greater than that of insertions for the Tibetan macaques (Fan et al., 2014). Sequencing technology now offers an (P<0.05, t-test), and the proportion of insertions and mean unprecedented opportunity to examine STR loci of good variations in STR allele size were slightly larger in SC than quality and STR variations among Tibetan macaques. In the in HS (Supplementary Table S6). On average, these loci present study, based on resequencing data, STR variations demonstrateda3.4bpdifference,withdifferencesgreaterthan and characterizations were investigated at the genome-level 5 bp accounting for approximately 18.30% in each individual. in Tibetan macaques. These data provide a novel genomic Furthermore,insertionsaccountedfor30.60%oflociandbase resourceforM.thibetanaspeciesandenrichthedatabaseon pair deletions accounted for 49.83% of loci (Figure 5 and geneticvariantsinmacaques. SupplementaryTableS6). 296 www.zoores.ac.cn Base pair differences from reference (bp) Figure5DistributionofallelesizedifferencesinSTRsfromthereferenceforthefiveM.thibetanaindividuals TheidentificationanddevelopmentofTibetanmacaqueSTR (InternationalHumanGenomeSequencingConsortium,2001; molecularmarkersreportedinpreviousstudieswerebasedon Subramanianetal.,2003). theconstructionofgenomiclibraries(Jiaetal.,2011,2012;Li etal.,2014),whichisbothlaborintensiveandtimeconsuming. Characterization and polymorphism of STRs based on Here, we employed resequencing data generated on the genotypingdata Illumina platform to characterize the distributional pattern of Within the 840 356 potential STRs identified, 62.68% of loci STRs. Analysis of the genome showed an average GC were called in all five individuals and 4.20% of loci were content of 40.48%, similar to that reported in other macaque only present in one individual. This may be due to the genomes (Zimin et al., 2014). The N50 and mean length low sequencing coverage, besides true differences among of the assembled genome were 4 966 bp and 2 172.38 bp, different individuals. Notably, however, STRs mapped to the respectively,sufficientfordetectingSTRloci.Wefoundthatthe chromosomesexhibitedsimilardistributionalregularitywiththe proportionofSTRsintheassembledgenomewasslightlylower distributionofSTRsobservedinothermacaquechromosomes thanthe0.83%–0.88%obtainedinothermacaques(Liuetal., (Liu et al., 2017b). For instance, most STR loci were aligned 2017b). This may be because short contigs of less than 150 to chr 1. Several studies have indicated a positive correlation basesweredeletedfromtheassemblytoobtainmorereliable betweenchromosomesizeandSTRnumber(Liuetal.,2017b; and accurate STR loci, leading to an incomplete genome. Qi et al., 2015). Thus, the distribution of chromosome size Mononucleotide STR loci are the most abundant repeat type in M. thibetana may be similar to that of the rhesus macaque. in most organisms (Sharma et al., 2007), as was also found In addition, we noted that GC-rich STR motifs showed lower inour study. Dinucleotide andtetranucleotideSTRs werethe polymorphism, which may be due to the correlation between secondmostabundantSTRsandshowedsimilarfrequencies, GC-richsequencesandfunctionality(Benjamini&Speed,2012). whichisinaccordancewiththatobservedinothermacaques Analysis of allele number of STR loci showed that only (Liu et al., 2017b). Furthermore, analysis of GC content and 1.41% of the loci showed no variation among the five repeat times for each repeat type also showed consistent individuals. A similar occurrence has been reported in cattle resultswithpreviousstudy(Liuetal.,2017b).Thesesimilarities genome research (Xu et al., 2016). Here, loci with only one are a good explanation for the relatively high success rate allele were found at highest proportion among trinucleotide of cross amplification among macaques (Engelhardt et al., STRs compared with other STRs. Furthermore, pSTRs 2017).WediscoveredthatthepredominantSTRlocicontained showed the lowest proportion among tri- and hexa-nucleotide more A or T bases, except for dinucleotide repeats in which STRs. These two results showed that variation in tri- and the AC motif was the most abundant, which has also been hexa-nucleotide STRs was smaller than that in the other four observed in humans (Subramanian et al., 2003). This could STRtypes. Thiscouldbeattributedtothefactthatthesetwo beduetoareportedtendencythatthepoly(A)stretchesinthe STRtypesaremorefrequentinexons(Qietal.,2016),which genome might be more biased toward AT base pairs during arerelatedtogenefunction(LaSpadaetal.,1991;MacDonald STR evolution and GC-rich genome sequences are more et al., 1993). However, pSTRs were overrepresented in easilymutatedtoproduceA-richrepeatsthanAT-richregions general,indicatingthathighpolymorphicSTRswererelatively ZoologicalResearch39(4):291–300,2018 297 abundant in the Tibetan macaque genome. These loci were populations. Furtherstudiesareneededtoexaminetheeffect found at the highest proportion in mono- and di-nucleotide ofthesepotentialSTRvariations. STRs,followedbytetra-nucleotideSTRs;however,mono-and Geneticrelationshipanalysesofmicrosatellites di-nucleotideSTRsareunstableandmoreeasilyproduceerrors Theneighbor-joiningtreeconstructedbasedontetranucleotide inexperiments(Morinetal.,2001;Taberletetal.,1999).Wealso STRs classified the five individuals into two different foundthenumberoftetranucleotideSTRswashigherthanthatof branches according to their geographic origin. Little genetic tri-,penta-,andhexa-nucleotideSTRs. Thus,itismoreefficient differentiation was observed between the Jianyang and Emei toscreenpolymorphiclocifromtetranucleotideSTRs. individuals. TibetanmacaquesfromSichuanandHuangshan From the 3 325 polymorphic tetranucleotide STRs (≥4 were classified into two different subspecies (M. thibetana alleles) with allele frequencies over 0.1 detected in this study, thibetana and M. thibetana huangshanensis) based on their six loci were used for genetic diversity analysis of the 16 externalmorphologicalandanatomicalvariations(Jiangetal., M. thibetana samples. The allele number, Ho, He, and 1996), and the neighbor-joining tree confirmed the genetic PIC results revealed high genetic diversity within Tibetan differentiation between the two populations. Previous study macaques. Recent study reported 12 polymorphic loci in M. based on mitochondrial DNA has demonstrated that the two thibetana screened from 30 STR sites in humans (Yang et populations exhibit significant genetic differentiation (Zhong al., 2017), though the amplification efficiency was lower than et al., 2013), which may be the result of geographical these STR loci identified in our study. This result indicated barriers. Specifically, gene flow between the two populations that the polymorphic STR loci identified in the present study is obstructed, leading to genetic differentiation (Zhong et al., can be used for amplification in the M. thibetana genome 2013). Owing to the limited number of samples, our results and polymorphisms of STRs were relatively high. The provide an initial understanding of the genetic variation of combination of STR loci searched in the assembled genome STRsinTibetanmacaques, andfuturestudiesareneededto and polymorphic STRs identified in this study for isolation of investigate population genetic variations using more samples STRswasbothefficientandtime-conserving. fromdistinctpopulations. STRvariationamongfiveTibetanmacaqueindividuals CONCLUSIONS Based on the 526 699 STR loci called in all five samples, Insummary,weperformedSTRcharacterizationintheTibetan we found Tibetan macaques in SC had more heterozygous macaque genome and provided a genome-wide atlas of reference/nonreference loci than those in HS. This indicated microsatellitedistributionwithnext-generationsequencingdata. that genetic diversity in Tibetan macaques was higher in SC ThepolymorphicSTRlociidentifiedfromthereferencegenome than in HS. Previous studies based on mitochondrial DNA exhibited good amplification efficiency and can be well used suggest that overall genetic diversity in M. thibetana is high, for population genetics. This profiling will be conducive for and that genetic variation is higher in the SC population than futuregenome-widegeneticanalysesoftheTibetanmacaque in the HS population (Li et al., 2008; Zhong et al., 2013). at the population scale. Analysis of genome-wide STRs also Ourresultsareconsistentwiththeseconclusions. Analysisof preliminarily revealed variation in the genetic diversity among SNV distributions in the Tibetan macaque revealed that there Tibetan macaque populations. This study contributes to our wasahighernumberofhomozygousSNVsthanheterozygous further recognition of the genetic variation among Tibetan SNVs (Fan et al., 2014). A similar occurrence was also macaquespecies. found in the distribution of STRs, namely the majority of STRswerehomozygouslociintheTibetanmacaque(P<0.05, COMPETINGINTERESTS t-test). Comparedwiththerhesusmacaquereferencealleles, the proportion of base pair deletions in the Tibetan macaque Theauthorsdeclarethattheyhavenocompetinginterests wasgreaterthanthatofinsertions(P<0.05,t-test),consistent AUTHORS’CONTRIBUTIONS with the small indels genotyped with the Genome Analysis Toolkit (GATK) (Fan et al., 2014). The average length of J.L. and S.X.L. conceived and designed the study. W.H. and X.Y.Z. loci was 19.19 bp in the present assembly, which is slightly performedthemolecularlabwork.S.X.L.,X.Y.Z.,C.J.P.,andZ.X.F.analyzed shorter than the 20.14 bp for M. mulatta (Liu et al., 2017b). thedata. S.X.L.wrotethepaper. J.L.andW.H.revisedthepaper. B.S.Y. TheseresultsindicatethatalargeproportionofSTRlocimay participatedinthedesignofthestudyandprovidedanexperimentalplatform. have longer repeats in rhesus macaques than orthologous Allauthorsreadandapprovedthefinalversionofthemanuscript. loci in M. thibetana. This has also been reported from human-chimpanzee genomic sequence alignments (Webster REFERENCES etal.,2002). TheSTRlociineachTibetanmacaqueshowed an average difference of 3.40 bp from the rhesus macaque. 1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, The proportion of insertions and mean variations in STR GarrisonEP,KangHM,KorbelJO,MarchiniJL,McCarthyS,McVeanGA, allele size were slightly larger in the Emei and Jianyang AbecasisGR.2015.Aglobalreferenceforhumangeneticvariation.Nature, individuals than in the Huangshan individuals, which may 526(7571):68–74. reveal differences in allele size of STRs between the two AbuzayedMA,GoktayM,AllmerJ,DoganlarS,FraryA.2017.Development 298 www.zoores.ac.cn ofgenomicsimplesequencerepeatmarkersinfababeanbynext-generation macaque(Macacathibetana).ZoologicalResearch,17(4):361–369.(inChinese) sequencing.PlantMolecularBiologyReporter,35(1):61–71. JiangZG,JiangJP,WangYZ,ZhangE,ZhangYY,LiLL,XieF,CaiB,CaoL, BenjaminiY,SpeedTP.2012.SummarizingandcorrectingtheGCcontent ZhengGM,DongL,ZhangZW,DingP,LuoZH,DingCQ,MaZJ,TangSH, biasinhigh-throughputsequencing.NucleicAcidsResearch,40(10):e72. CaoWX,LiCW,HuHJ,MaY,WuY,WangYX,ZhouKY,LiuSY,ChenYY,Li Castagnone-Sereno P, Danchin EGJ, Deleury E, Guillemaud T, Malausa JT,FengZJ,WangY,WangB,LiC,SongXL,CaiL,ZangCX,ZengY,Meng T, Abad P. 2010. Genome-wide survey and analysis of microsatellites ZB,FangHX,PingXG.2016.RedlistofChina’svertebrates.Biodiversity in nematodes, with a focus on the plant-parasitic species Meloidogyne Science,24(5):500–551.(inChinese) incognita.BMCGenomics,11:598. LaSpadaAR,WilsonEM,LubahnDB,HardingAE,FischbeckKH.1991. ChistiakovDA,HellemansB,VolckaertFAM.2006.Microsatellitesandtheir AndrogenreceptorgenemutationsinX-linkedspinalandbulbarmuscular genomic distribution, evolution, function and applications: a review with atrophy.Nature,352(6330):77–79. specialreferencetofishgenetics.Aquaculture,255(1–4):1–29. Levinson G, Gutman GA. 1987. Slipped-strand mispairing: a major DuLM,LiYZ,ZhangXY,YueBS.2013.MSDB:auser-friendlyprogramfor mechanism for dna sequence evolution. Molecular Biology and Evolution, reportingdistributionandbuildingdatabasesofmicrosatellitesfromgenome 4(3):203–221. sequences.JournalofHeredity,104(1):154–157. LiDM,FanLQ,RanJH,YinHL,WangHX,WuSB,YueBS.2008.Genetic Engelhardt A, Muniz L, Perwitasari-Farajallah D, Widdig A. 2017. Highly diversityanalysisofMacacathibetanabasedonmitochondrialDNAcontrol polymorphicmicrosatellitemarkersfortheassessmentofmalereproductive regionsequences.DNASequence,19(5):446–452. skew and genetic variation in critically endangered crested macaques Li P, Yang CZ, Zhang XY, Li J, Yi Y, Yue BS. 2014. Development of (Macacanigra).InternationalJournalofPrimatology,38(4):672–691. eighteentetranucleotidemicrosatellitemarkersinTibetanmacaque(Macaca Fan ZX, Zhao G, Li P, Osada N, Xing JC, Yi Y, Du LM, Silva P, Wang thibetana)andgeneticdiversityanalysisofcaptivepopulation.Biochemical HX, Sakate R, Zhang XY, Xu HL, Yue BS, Li J. 2014. Whole-genome SystematicsandEcology,57:293–296. sequencingofTibetanmacaque(Macacathibetana)providesnewinsight Li YC, Korol AB, Fahima T, Nevo E. 2004. Microsatellites within genes: into the macaque evolutionary history. Molecular Biology and Evolution, structure, function, andevolution.MolecularBiologyandEvolution, 21(6): 31(6):1475–1489. 991–1007. GilJ,UmY,KimS,KimOT,KooSC,ReddyCS,KimSC,HongCP,Park Liu CC, Liu Y, Zhang XY, Xu XW, Zhao SH. 2017a. Characterization of SG,KimHB,LeeDH,JeongBH,ChungJW,LeeY.2017.Developmentof porcinesimplesequencerepeatvariationonapopulationscalewithgenome genome-wideSSRmarkersfromAngelicagigasnakaiusingnextgeneration resequencingdata.ScientificReports,7(1):2376. sequencing.Genes,8(10):238. LiuG,ZengT,YuWH,YanNH,WangHX,CaiSP,PangIH,LiuXY.2011. Guichoux E, Lagache L, Wagner S, Chaumeil P, Léger P, Lepais O, CharacterizationofintraocularpressureresponsesoftheTibetanmonkey LepoittevinC,MalausaT,RevardelE,SalinF,PetitRJ.2011.Currenttrends (Macacathibetana).MolecularVision,17:1405–1413. inmicrosatellitegenotyping.MolecularEcologyResources,11(4):591–611. Liu SX, Hou W, Sun TL, Xu YT, Li P, Yue BS, Fan ZX, Li J. 2017b. GymrekM,GolanD,RossetS,ErlichY.2012.lobSTR:ashorttandemrepeat Genome-wideminingandcomparativeanalysisofmicrosatellitesinthree profilerforpersonalgenomes.GenomeResearch,22(6):1154–1162. macaquespecies.MolecularGeneticsandGenomics,292(3):537–550. GymrekM,WillemsT,GuilmatreA,ZengHY,MarkusB,GeorgievS,Daly MacDonald ME, Ambrose CM, Duyao MP, Myers RH, Lin C, Srinidhi L, MJ,PriceAL,PritchardJK,SharpAJ,ErlichY.2016.Abundantcontribution BarnesG,TaylorSA,JamesM,GrootN,MacFarlaneH,JenkinsB,Anderson of short tandem repeats to gene expression variation in humans. Nature MA,WexlerNS,GusellJF,BatesGP,BaxendaleS,HummerichH,KirbyS, Genetics,48(1):22–29. NorthM,YoungmanS,MottR,ZehetnerG,SedlacekZ,PoustkaA,Frischauf HallBG.2013.BuildingphylogenetictreesfrommoleculardatawithMEGA. AM, Lehrach H, Buckler AJ, Church D, Doucette-Stamm L, O’Donovan MolecularBiologyandEvolution,30(5):1229–1235. MC,Riba-RamirezL,ShahM,StantonVP,StantonSA,DrathsKM,Wales International Human Genome Sequencing Consortium. 2001. Initial JL,DervanP,HousmanDE,AltherrM,ShiangR,ThompsonL,FielderT, sequencingandanalysisofthehumangenome.Nature,409(6822):860–921. WasmuthJJ,TagleD,ValdesJ,ElmerL,AllardM,CastillaL,SwaroopM, JiHC,LiX,YueSQ,LiJJ,ChenH,ZhangZC,MaB,WangJ,PuM,Zhou BlanchardK,CollinsFS,SnellR,HollowayT,GillespieK,DatsonN,Shaw L,FengC,WangDS,DuanJL,PanDK,TaoKS,DouKF.2015.PigBMSCs D,HarperPS.1993.Anovelgenecontainingatrinucleotiderepeatthatis transfectedwithhumanTFPIcombatspeciesincompatibilityandregulate expandedandunstableonHuntington’sdiseasechromosomes.Cell,72(6): thehumanTFpathwayinvitroandinarodentmodel.CellularPhysiology 971–983. andBiochemistry,36(1):233–249. Morin PA, Chambers KE, Boesch C, Vigilant L. 2001. Quantitative JiaXD,YangBD,YueBS,YinHL,WangHX,ZhangXY.2011.Isolationand polymerasechainreactionanalysisofDNAfromnoninvasivesamplesfor characterizationoftwenty-onepolymorphicmicrosatellitelociintheTibetan accurate microsatellite genotyping of wild chimpanzees (Pan troglodytes macaque(Macacathibetana).RussianJournalofGenetics,47(7):884–887. verus).MolecularEcology,10(7):1835–1844. JiaXD,ZhangXY,WangHX,WangZK,YinYH,YueBS.2012.Construction OliveiraEJ,PáduaJG,ZucchiMI,VencovskyR,VieiraMLC.2006.Origin, of microsatellite-enriched library and isolation of microsatellite markers in evolutionandgenomedistributionofmicrosatellites.GeneticsandMolecular Macacathibetana.SichuanJournalofZoology,31(1):39–44.(inChinese) Biology,29(2):294–307. Jiang XL, Wang YX, Wang QS. 1996. Taxonomy and distribution of tibetan Osada N, Hettiarachchi N, Adeyemi Babarinde I, Saitou N, Blancher A. ZoologicalResearch39(4):291–300,2018 299 2015.Whole-genomesequencingofsixMauritiancynomolgusmacaques beforeyouleap.TrendsinEcology&Evolution,14(8):323–327. (Macaca fascicularis) reveals a genome-wide pattern of polymorphisms TeamRC.2000.RLanguageDefinition.Vienna,Austria: RFoundationfor underextremepopulationbottleneck.GenomeBiologyandEvolution,7(3): StatisticalComputing. 821–830. WebsterMT,SmithNGC,EllegrenH.2002.Microsatelliteevolutioninferred PatelRK,JainM.2012.NGSQCToolkit:atoolkitforqualitycontrolofnext fromhuman-chimpanzeegenomicsequencealignments.Proceedingsofthe generationsequencingdata.PLoSOne,7(2):e30619. National Academy of Sciences of the United States of America, 99(13): PowellW,MachrayGC,ProvanJ.1996.Polymorphismrevealedbysimple 8748–8753. sequencerepeats.TrendsinPlantScience,1(7):215–222. WillemsT,GymrekM,HighnamG,The1000GenomesProjectConsortium, Qi WH, Jiang XM, Du LM, Xiao GS, Hu TZ, Yue BS, Quan QM. 2015. MittelmanD,ErlichY.2014.ThelandscapeofhumanSTRvariation.Genome Genome-Wide survey and analysis of microsatellite sequences in bovid Research,24(11):1894–1904. species.PLoSOne,10(7):e0133667. XuLY,HaaslRJ,SunJJ,ZhouY,BickhartDM,LiJY,SongJZ,Sonstegard Qi WH, Yan CC, Li WJ, Jiang XM, Li GZ, Zhang XY, Hu TZ, Li J, Yue TS,VanTassellCP,LewinHA,LiuGE.2016.Systematicprofilingofshort BS.2016.DistinctpatternsofsimplesequencerepeatsandGCdistribution tandemrepeatsinthecattlegenome.GenomeBiologyandEvolution,9(1): in intragenic and intergenic regions of primate genomes. Aging, 8(11): 20–31. 2635–2654. XueC,RaveendranM,HarrisRA,FawcettGL,LiuXM,WhiteS,DahdouliM, QinYJ,BuahomN,KroschMN,DuY,WuY,MalacridaAR,DengYL,Liu RioDeirosD,BelowJE,SalernoW,CoxL,FanGP,FergusonB,HorvathJ, JQ, Jiang XL, Li ZH. 2016. Genetic diversity and population structure in JohnsonZ,KanthaswamyS,KubischHM,LiuDH,PlattM,SmithDG,Sun Bactrocera correcta (Diptera: Tephritidae) inferred from mtDNA cox1 and BH,VallenderEJ,WangF,WisemanRW,ChenR,MuznyDM,GibbsRA,Yu microsatellitemarkers.ScientificReports,6:38476. FL,RogersJ.2016.Thepopulationgenomicsofrhesusmacaques(Macaca QuJT,LiuJ.2013.Agenome-wideanalysisofsimplesequencerepeatsin mulatta)basedonwhole-genomesequences.GenomeResearch, 26(12): maizeandthedevelopmentofpolymorphismmarkersfromnext-generation 1651–1662. sequencedata.BMCResearchNotes,6:403. YangN,ZhouL,LiuXS,ZengDW.2017.Isolationandcharacterizationof RubinszteinDC,AmosW,LeggoJ,GoodburnS,JainS,LiSH,MargolisRL, novelTibetanmacaque(Macacathibetana)microsatelliteloci:cross-species RossCA,Ferguson-SmithMA.1995.Microsatelliteevolution—evidencefor amplification and population genetic applications. Biomedical Research, directionalityandvariationinratebetweenspecies.NatureGenetics,10(3): 28(1):268–272. 337–343. YiY,ZengT,ZhouL,CaiSP,YinY,WangY,CaoX,XuYZ,WangHX,Liu SchlöttererC,TautzD.1992.SlippagesynthesisofsimplesequenceDNA. XY.2012.ExperimentalTibetanmonkeydomesticationanditsapplicationfor NucleicAcidsResearch,20(2):211–215. intraocularpressuremeasurement.InternationalJournalofOphthalmology, Schlötterer C. 1998. Genome evolution: are microsatellites really simple 5(3):277–280. sequences?CurrentBiology,8(4):R132–R134. Zhong LJ, Zhang MW, Yao YF, Ni QY, Mu J, Li CQ, Xu HL. 2013. Schlötterer C. 2000. Evolutionary dynamics of microsatellite DNA. GeneticdiversityoftwoTibetanmacaque(Macacathibetana)populations Chromosoma,109(6):365–371. from Guizhou and Yunnan in China based on mitochondrial DNA D-loop SharmaPC,GroverA,KahlG.2007.Miningmicrosatellitesineukaryotic sequences.Genes&Genomics,35(2):205–214. genomes.TrendsinBiotechnology,25(11):490–498. ZhongXM,PengJG,ShenQS,ChenJY,GaoH,LuanXK,YanS,YHuang SimpsonJT,WongK,JackmanSD,ScheinJE,JonesSJM,BirolI.2009. X, Zhang SJ, Xu LY, Zhang XQ, Tan BCM, Li CY. 2016. RhesusBase ABySS: a parallel assembler for short read sequence data. Genome PopGateway: Genome-Widepopulationgeneticsatlasinrhesusmacaque. Research,19(6):1117–1123. MolecularBiologyandEvolution,33(5):1370–1375. Subramanian S, Mishra RK, Singh L. 2003. Genome-wide analysis of Zimin AV, Cornish AS, Maudhoo MD, Gibbs RM, Zhang XF, Pandey S, microsatelliterepeatsinhumans: theirabundanceanddensityinspecific Meehan DT, Wipfler K, Bosinger SE, Johnson ZP, Tharp GK, Marçais genomicregions.GenomeBiology,4(2):R13. G, Roberts M, Ferguson B, Fox HS, Treangen T, Salzberg SL, Yorke JA, SunnucksP.2000.Efficientgeneticmarkersforpopulationbiology.Trendsin NorgrenRBJr.2014.Anewrhesusmacaqueassemblyandannotationfor Ecology&Evolution15(5):199–203. next-generationsequencinganalyses.BiologyDirect,9(1):20. TaberletP,WaitsLP,LuikartG.1999.Noninvasivegeneticsampling: look 300 www.zoores.ac.cn

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.