ebook img

Ancestral Origins and Genetic History of Tibetan Highlanders PDF

15 Pages·2016·6.56 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Ancestral Origins and Genetic History of Tibetan Highlanders

Pleasecitethisarticleinpressas:Luetal.,AncestralOriginsandGeneticHistoryofTibetanHighlanders,TheAmericanJournalofHuman Genetics(2016),http://dx.doi.org/10.1016/j.ajhg.2016.07.002 ARTICLE Ancestral Origins and Genetic History of Tibetan Highlanders Dongsheng Lu,1,2,9 Haiyi Lou,1,9 Kai Yuan,1,2,9 Xiaoji Wang,1,2,3,9 Yuchen Wang,1,2,9 Chao Zhang,1,2,9 Yan Lu,1 Xiong Yang,1,2 Lian Deng,1,2 Ying Zhou,1,2 Qidi Feng,1,2 Ya Hu,4 Qiliang Ding,4 Yajun Yang,4 Shilin Li,4 Li Jin,4 Yaqun Guan,5 Bing Su,6 Longli Kang,7 and Shuhua Xu1,2,3,8,* TheoriginofTibetansremainsoneofthemostcontentiouspuzzlesinhistory,anthropology,andgenetics.Analysesofdeeplysequenced (303–603)genomesof38Tibetanhighlandersand39HanChineselowlanders,togetherwithavailabledataonarchaicandmodern humans,allowustocomprehensivelycharacterizetheancestralmakeupofTibetansanduncovertheirorigins.Non-modernhuman sequencescompose~6%oftheTibetangenepoolandformuniquehaplotypesinsomegenomicregions,whereDenisovan-like,Nean- derthal-like, ancient-Siberian-like, and unknown ancestries are entangled and elevated. The shared ancestry of Tibetan-enriched sequencesdatesbackto~62,000–38,000yearsago,predatingtheLastGlacialMaximum(LGM)andrepresentingearlycolonization of the plateau. Nonetheless, most of the Tibetan gene pool is of modern human origin and diverged from that of Han Chinese ~15,000to~9,000yearsago,whichcanbelargelyattributedtopost-LGMarrivals.Analysisof~200contemporarypopulationsshowed thatTibetansshareancestrywithpopulationsfromEastAsia(~82%),CentralAsiaandSiberia(~11%),SouthAsia(~6%),andwestern EurasiaandOceania(~1%).OurresultssupportthatTibetansarosefromamixtureofmultipleancestralgenepoolsbutthattheirorigins aremuchmorecomplicatedandancientthanpreviouslysuspected.Weprovidecompellingevidenceoftheco-existenceofPaleolithic andNeolithicancestriesintheTibetangenepool,indicatingageneticcontinuitybetweenpre-historicalhighland-foragersandpresent- dayTibetans.Inparticular,highlydifferentiatedsequencesharboredinhighlanders’genomesweremostlikelyinheritedfrompre-LGM settlersofmultipleancestralorigins(SUNDer)andmaintainedinhighfrequencybynaturalselection. Introduction human (AMH) or non-AMH (archaic hominid) species. Furtherinvestigationisrequiredtoconfirmwhetherthere Current knowledge of the origin and population history is a genetic continuity, or merely a continuity of culture, of Tibetan highlanders is still very much in its infancy between the pre-historical and present-day populations andcontroversial.Inparticular,twokeyquestionsremain of Tibetans. unsolved:(1)whodidTibetansdescendfrom,and(2)how A recent study based on sequencing a single gene in long have human beings been living at the Tibetan Tibetans suggested that aparticular haplotype motif con- Plateau? Both archaeological and genetic studies have sisting of five SNPs surrounding EPAS1 (MIM: 603349) documented a human presence on the plateau as early wasintrogressedfromDenisovans,anextinctnon-modern as 30,000 years before present (YBP).1,2 Linguistic studies human species.7 This indicated the existence of a non- havesuggestedthattheTibetanandChinesepeopleshare AMH-derivedsequenceinTibetangenomes.Nonetheless, a common root ancestor and that the Tibetan-Chinese the extent to which the non-AMH sequences contribute splittookplace~6,000YBP.3Arecentgeneticstudyutiliz- to the genetic makeup is unknown. Neither is it clear ingexomesequencingdataestimatedadivergencetimeof where the non-AMH sequences in Tibetans originated. 2,750yearsbetweenTibetansandHanChinese.4Existing Were these sequences introduced indirectly by recent archaeological and genetic data are not sufficient to migrationsofpopulationswithbothAMHandnon-AMH address the discrepancy between times estimated by ancestries? Or were the archaic sequences inherited different studies. Analysis of Y chromosome and mito- directly from early Tibetan Plateau inhabitants and are chondrial DNA (mtDNA) has demonstrated the co- thusindependentofthosefoundinotherEastAsianpop- existence of both Paleolithic and Neolithic ancestries in ulations? Answering these questions is indispensable for the paternal and maternal lineages of present-day Ti- understanding the demographic history of Tibetans and betans5,6buthasnotexplicitlydeterminedwhetherthose elucidating the mechanism of their adaptation to high- Paleolithiclineagesoriginatedfromanatomicallymodern altitudeenvironments. 1KeyLaboratoryofComputationalBiology,MaxPlanckIndependentResearchGrouponPopulationGenomics,CAS-MPGPartnerInstituteforComputa- tionalBiology,ShanghaiInstitutesforBiologicalSciences,ChineseAcademyofSciences,Shanghai200031,China;2UniversityofChineseAcademyof Sciences,Beijing100049,China;3SchoolofLifeScienceandTechnology,ShanghaiTechUniversity,Shanghai200031,China;4StateKeyLaboratoryof GeneticEngineeringandMinistryofEducationKeyLaboratoryofContemporaryAnthropology,SchoolofLifeSciences,FudanUniversity,Shanghai 200433,China;5DepartmentofBiochemistryandMolecularBiology,PreclinicalMedicineCollege,XinjiangMedicalUniversity,Urumqi830011,China; 6State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China; 7KeyLaboratoryforMolecularGeneticMechanismsandInterventionResearchonHighAltitudeDiseaseofTibetAutonomousRegion,SchoolofMedicine, XizangMinzuUniversity,XianyangShaanxi712082,China;8CollaborativeInnovationCenterofGeneticsandDevelopment,Shanghai200438,China 9Theseauthorscontributedequallytothiswork *Correspondence:[email protected] http://dx.doi.org/10.1016/j.ajhg.2016.07.002. (cid:1)2016AmericanSocietyofHumanGenetics. TheAmericanJournalofHumanGenetics99,1–15,September1,2016 1 Pleasecitethisarticleinpressas:Luetal.,AncestralOriginsandGeneticHistoryofTibetanHighlanders,TheAmericanJournalofHuman Genetics(2016),http://dx.doi.org/10.1016/j.ajhg.2016.07.002 Existing archaeological and genetic data are not suffi- thelociwithanancestralallele(A,C,T,andGora,c,t,andg) cient to address the fundamental questions mentioned matchingthe‘‘REF’’or‘‘ALT’’alleleintheVCFfiles.ASNP’sances- above. This manuscript describes a whole-genome deep- tral state was not used if the SNP mapped to more than one sequencing effort of Tibetan highlanders and samples of genomicregion.Finally,11,021,340SNVswithanunambiguously derivedalleleremainedforcomparisonofderived-allelefrequency. Han Chinese, a lowland-residing group that shares close ancestry with the Tibetan people. A careful analysis of PublicandPublishedData these genomes, together with available data on archaic AncientgenomesusedintheanalysisincludedthoseofanAltai and modern humans, can provide further insights into Neanderthal,14aDenisovan,15andUst’-Ishim,a45,000-year-old thegeneticprehistoryandadaptationsofTibetans. modern human from Siberia.16 The Affymetrix Human Origins genotyping dataset17 and variants from phase 3 of the 1000 Material and Methods GenomesProject18werealsoincludedinthedataanalysisofthis study(seeTablesS3andS4). PopulationsandSamples DeterminingYChromosomalandmtDNALineages Peripheral-blood samples of 33 Tibetans (TBN) and 5 Sherpas YchromosomalandmtDNAhaplogroupsweredeterminedonthe (SHP)werecollectedfromsixprefectures(Lhasa,Chamdo,Nagqu, basisofthekeymutationscommonlyusedfornomenclature.We Nyingchi, Shannan, and Shigatse) in the Tibet Autonomous developed an algorithm to search all possible combinations of Region (Figure S1 and Table S1), and blood samples of 39 Han the key mutations used for nomenclature from our sequence Chinese (HAN) were collected from diverse regions in China datatodeterminethefine-scalepaternalandmaternalhaplogroup (Figure S1 and Table S1). Each individual was the offspring of a affiliated with each sample (Tables S5 and S6). To compare the non-consanguineousmarriageofmembersofthesamenational- frequency distribution of worldwide populations (because fine- ity within three generations. All samples were collected with scalesub-lineagesofmostworldwidepopulationswerenotavail- informed consent and approved by the Biomedical Research able), we combined sub-lineages under each major paternal or Ethics Committee of the Shanghai Institutes for Biological Sci- maternalhaplogroup. ences.Priortosequencingandanalysis,allsampleswerestripped of personal identifiers (if any existed). All procedures were in accordancewiththeethicalstandardsoftheResponsibleCommit- EstimatingGeneticDifferencesbetweenPopulations tee on Human Experimentation (approved by the Biomedical GeneticdifferencesbetweenpopulationsweremeasuredwithFST ResearchEthicsCommitteeoftheShanghaiInstitutesforBiolog- according to Weir and Cockerham;19 this metric accounts for ical Sciences) and the Helsinki Declaration of 1975 (revised in differencesinthesamplesizeineachpopulation.TheSNP-specific 2000). FST between Tibetan and Han Chinese for each SNP was also calculated with the same formula. Confidence intervals of the F overlociwerecalculatedbybootstrapresamplingwith1,000 GenomeSequencingandDataProcessing ST replications. To reduce the influence of large sample-size Whole-genomesequencing,withhightargetcoverage(303–603) differencesbetweenpopulations,wediscardedHuOriginpopula- for150bppaired-endreads,wascarriedoutonanIlluminaHiSeqX tions17withasamplesizesmallerthanfiveforpairwisecompari- TenaccordingtoIllumina-providedprotocolswithstandardlibrary son;theseincludedHimba(4),Kikuyu(4),Datog(3),Oromo(4), preparationatWuXiNextCODE(Shanghai).Eachsamplewasrun Italian_South(1),Canary_Islanders(2),Saami_WGA(1),Scottish onauniquelanewithatleast90GBofdatathathadpassedfiltering, (4),Dolgan(3),Tlingit(4),Australian(3),Chane(1),Chilote(4), and read data were quality controlled so that 80% of the bases Inga(2),Piapoco(4),Ticuna(1),andWayuu(1).Wealsodiscarded achieved at least a base qualityscore of 30. Readswere merged, SNPswithamissingratelargerthan20%inanysinglepopulation. adaptor trimmed, and mapped to the human reference genome Thisleftuswith563,816autosomalSNPs,whichweanalyzedto (GRCh37) with the Burrows-Wheeler Aligner.8 Variant calling estimatepairwiseF for188populations. wascarriedoutwiththeHaplotypeCallermoduleintheGenome ST AnalysisToolkit(GATK).9,10 Principal-ComponentAnalysis WefilteredtheresultsofjointvariantcallingofHANandTIB (TBNandSHP)samplesandretainedautosomalsingle-nucleotide Principal-componentanalysis(PCA)wasperformedattheindivid- variants(SNVs)that(1)werebi-allelicinHANandTIBand(2)had uallevelwithEIGENSOFTv.3.0.20Toinvestigatefine-scalepopula- a missing rate less than 20% in any one of the HAN, TBN, and tion structure and individual genetic affinities, we performed a SHPpopulations.Finally,wewereleftwith11,452,436SNVsfor seriesofPCAsbygraduallyremoving‘‘outliers’’onthebasisofa estimating minor allele frequency, heterozygosity, inbreeding plotof the first and secondprincipal components (PCs) and re- coefficient, and runs of homozygosity, etc. Individual heter- analyzingtheremainingsamplesonthebasisofthesamesetof ozygosity, identity by state, identity by descent, and inbreeding SNVmarkers. coefficientwereestimatedwithPLINKv.1.07.11 Todeterminetheancestralandderivedalleleofeachvariantob- UsingOutgroupf3StatisticstoTestAncientAncestry tained from our sequenced genomes and the public data inte- WeassessedthegeneticrelatednessbetweenancientandTIBge- grated in our analysis, we downloaded the ancestral sequences nomesbycomputingoutgroupf statistics;thismethodisinsen- 3 fromthe1000GenomesProject;thesewereinferredfromsixpri- sitivetogeneticdriftandlessaffectedbysample-sizebiaswhen matesintheEnredo-Pecan-Ortheus(EPO)pipeline.Weemployed ancientgenomesareinvolved.21 BEDToolsv.2.24.012,13togettheancestralalleleforeachlocusin Wecomputedf (X;ancient,African)accordingtoReichetal.21 3 thevariant callformat(VCF) files fromtheancestralsequences. to find the admixture signal from ancient genomes in modern Intheanalysesrelatedtothederivedallelestates,weonlychose human populations. Ancient genomes used in the analysis 2 TheAmericanJournalofHumanGenetics99,1–15,September1,2016 Pleasecitethisarticleinpressas:Luetal.,AncestralOriginsandGeneticHistoryofTibetanHighlanders,TheAmericanJournalofHuman Genetics(2016),http://dx.doi.org/10.1016/j.ajhg.2016.07.002 included those of an Altai Neanderthal,14 a Denisovan,15 and sample size (or haplotype number) during removal of non- Ust’-Ishim.16BAMfilesoftheancientgenomesweredownloaded modern human sequences, we directly masked the genomic fromtherespectivereferencesandmanagedwithPicardv.1.112, regionsonthebasisofS*resultswithasignatureofnon-mod- and the genotypes were called with GATK v.3.3.9 Data on DNA ern human ancestry identified in any of the genomes for polymorphisms were stored in VCF files and managed with analysis. Therefore, we removed genomic regions with non- VCFtools.22 X represents worldwide non-African modern hu- modern human ancestry rather than partially removing man populations (23 Native American, 23 Central Asian and haplotypes. Siberian, 25 East Asian, 4 Oceanian, 22 South Asian, and 58 West Eurasian). A significantly negative f (X; ancient, YRI 3 AncestryAnalysiswithADMIXTURE [Yoruba in Ibadan, Nigeria]) (jZj > 2) suggests ancient gene The model-based ancestry-estimation method ADMIXTURE32 introgression. There is no significant admixture signal from was applied on the merged HuOrigin, TIB, and HAN dataset any of the ancient genomes (Altai Neanderthal, Denisovan, or consisting of 2,422 individuals from 205 worldwide popula- Ust’-Ishim), given that all f values are significantly positive 3 tions. ADMIXTURE, which infers individual genetic-ancestry (jZj>2).Therefore,theseanalysesdonotprovideclearinsight coefficientsbyconditioningonaspecificnumber(K)of‘‘ances- into the ancient admixture in modern humans, possibly tral populations,’’ is a useful tool for analyzing population because the analysis design of f (X; ancient, African) is not 3 structure. We used it mainly to identify the structure of TIB powerful enough or the data are insufficient, for example, the andHANinthecontextofworldwideandregionalpopulations. admixtureproportionfromancientgenomesissmalloradmix- Because the model in ADMIXTURE does not consider linkage tureoccurredtoolongago. disequilibrium (LD), we used PLINK v.1.0711 to prune the Further,weusedan‘‘outgroup’’f statistic,f (African;ancient, 3 3 original dataset with dense SNPs. After we assigned an r2 X), according to Raghavan et al.23 to examine the relationship threshold of 0.4 in every continuous window of 200 SNPs between modern human populations and ancient genomes advanced by 25 SNPs (--indep-pairwise 200 25 0.4), 288,979 (Altai Neanderthal, Denisovan, and Ust’-Ishim). Without SNPs remained for the ADMIXTURE analysis.32 We ran gene flow from an ancient population, f (YRI; ancient, X) has 3 ADMIXTURE with a random seed for the merged dataset the same value for a different X. A larger f value suggests 3 from K ¼ 2 to K ¼ 20 with default parameters (--cv ¼ 5) in more allele sharing with ancient individuals and ancient gene tenreplicatesforeachK. introgression. Becausetheindividualancestryproportionvariedsignificantly whenKwaslarge,weusedCLUMPP33toexaminetheconvergence of different runs in the ten replicates for each K; the optimal EstimationofN andDivergenceTime e ancestry proportion was defined by the single replicate that We applied pairwise sequentially Markovian coalescent (PSMC) showedtheclosestvaluetotheancestryproportionthatappeared analysis24 and multiple sequentially Markovian coalescent most commonly among the ten replicates. We also assessedthe (MSMC) analysis25 to infer the long-term effective population cross-validation error in the ten replicates to find the best K of sizes (N) and time of divergence (T ) from the high- e divergence theancestralpopulations. coverage genomes. PSMC analysis was implemented according to the original procedure.25 For MSMC analysis, rather than phasingthedatafrombamCaller.pydirectly,weusedonlythose EstimatingAncientGeneFlowontheBasisof SNVswith allelestatesidenticaltothoseof genotypescalledby f4Statistics bamCaller.py fromthe VCFresultsfrom GATK analysis,and we Weusedanf test(Dtest),D(African,X,Neanderthal,Denisovan), 4 phasedthedatawithSHAPEIT2.26 accordingtoPattersonetal.34,35todeterminethemainsourceof Ourestimationofeffectivepopulationsizeovertimeandthat gene flow (either Neanderthal or Denisovan) to the target pop- of divergence times and time to the most recent common ulations (particularly TIB). A significantly negative D value ancestor (TMRCA) in this study depends on estimates of the indicates more Neanderthal than Denisovan allele sharing in germlinemutationrate(m).Toovercomeuncertaindateestimates population X, a non-African modern human population. In duetoalackofconfidenceinthetruevalueofthemutationrate, contrast, a significantly positive D value (Z > 2) suggests more we scaled time in units of 2mTand effective population size in gene introgression from Denisovans than from Neanderthals. unitsof2mN 3103(wheremisthemutationrate,N istheeffec- WiththeexceptionofOceanianpopulations,non-Africanpopula- e e tivepopulationsize,andTistimeingenerations).Wealsocalcu- tions always share significantly more alleles with Neanderthals lated absolute estimation by assuming a fast mutation rate of thanwithDenisovans. 1.0 3 10(cid:2)9 per site per year (the commonly used phylogenetic UsingD(African,ancient,TIB,X)accordingtoDurandetal.,17,34 mutation rate, i.e., ~2.5 3 10(cid:2)8 per base per human generation wefurtherdeterminedwhetheracertainancientgenomecontrib- for a generation time of 25 years)27 or a slow mutation rate of utesmoreancestrytoTIBthantoanyothernon-Africanmodern 0.5 3 10(cid:2)9 per site per year (~1.25 3 10(cid:2)8 per base per human humanpopulation.AsignificantlypositiveDvalueindicatesthat generationforagenerationtimeof25years).28–30Inmostocca- ancientgenomesshareahigherproportionofalleleswithpopula- sions,we adoptedaslow mutation rate(0.5 3 10(cid:2)9 persite per tionXthanwithTIB. year)becauseresultsbasedontheslowmutationrateagreebetter Finally,webasedfourdesigns(fourequations)onf statisticsto 4 with the paleoanthropological record and with estimates from quantitatively estimate the ancient ancestry in modern human mtDNA.31 populationstofindoutwhetherthereisexcessarchaicancestry Tostudytheinfluenceofthenon-modernhumansequences inTIB.Wefirstrandomlyselectedapopulationasthereference on the estimation of N and divergence time, we removed population(ref)andobtainedaseriesofratiovalues.Thepopula- e non-AMH sequences from the genomes used for analysis. To tionshowingthesmallestratiovalue,whichisalwaysnegative, facilitate the analysis and avoid any bias of the fluctuation of mighthavetherelativelysmallestgeneintrogressionfromancient TheAmericanJournalofHumanGenetics99,1–15,September1,2016 3 Pleasecitethisarticleinpressas:Luetal.,AncestralOriginsandGeneticHistoryofTibetanHighlanders,TheAmericanJournalofHuman Genetics(2016),http://dx.doi.org/10.1016/j.ajhg.2016.07.002 populations. Then, we used these reference populations to esti- astatewithahigherE-allelerateascandidatesforarchaicintro- mateancientgeneflowintheothernon-Africanpopulations: gressionsandimplementedathree-steppedapproachtofilterfalse positives.42–44Foreachcandidatesegment,wefirstreconstructeda ratio ¼ Sðref; X; ancient; AfricanÞ 36 (Equation1) phylogenetictreeforthecandidatesegment,alongwithNeander- Sðref; ancient; ancient; AfricanÞ thal, Denisovan, African, and chimpanzee sequences. To detect Neanderthal-likeandDenisovan-likesequences,werequiredthe AsinEquation1,weusedchimpanzee(chimp)asanoutgroup segment to coalesce first with Neanderthals and Denisovans, and used Africans (YRI) as a reference population, which was respectively. Second, we estimated the divergence time between assumedtohavenoancientgeneflow: thecandidateandNeanderthalsegments,denotedast (orbe- Nean ratio ¼SðASfðrAicfarnic;aann;cXie;natn;cainencite;ncth;icmhpimÞpÞ36 (Equation2) tdweteeecnt cNaenadniddearttehaanl-dlikDeesneiqsouveanncesse,gwmeenretsq,udireendotteNdeanastotDebnei).leTsos than550,000yearsago14andt tobelessthant .Todetect Nean Deni InviewofthesimilaritybetweenNeanderthalsandDenisovans, Denisovan-like sequences, we required t to be less than Deni thepreviousformulasmightoverestimatethegeneflowofeach 550,000yearsagoandt tobelessthant .Third,werequired Deni Nean ancientpopulation.So,wealsousedanotherancientpopulation thegeneticlengthofthesegmentstobeconsistentwithanintro- asanoutgrouptocontrolsuchaneffect: gressiontimelaterthan550,000yearsagotorejecttheancestral polymorphismmodel. SðAfrican; X; Neanderthal; DenisovanÞ 37 ratio ¼ OnthebasisofE-allelesandthealgorithmdescribedabove,we SðAfrican; Neanderthal; Neanderthal; DenisovanÞ further extended the method to the haplotype level and imple- (Equation3) mented the algorithm in a computer program, ArchaicSeeker, In addition, we can also estimate the non-ancient admixture which is a more heuristic method of detecting archaic DNA se- proportion.Here,NeanderthalsandDenisovansarereferencepop- quences in present-day human genomes. For those detected ulationsforeachother: archaicgenomicsegments,weclassifiedthemintothreegroups: Neanderthal-like,Denisovan-like,andunknownarchaic.Thefirst Sðref; X; African; chimpÞ 37 twogroupswereresultsfromthethreemethodsbasedoncompar- ratio¼1(cid:2) (Equation4) Sðref; African; African; chimpÞ ativeanalysesofthearchaicgenomes(DenisovanorNeanderthal). However, the unknown archaic segments, which could be de- tectedonlybytheS*method,weredefinedasarchaic-likesignals DetectingArchaicSequencesinModernHuman that cannot be attributed to either modern human genomes or Genomes sequencedarchaicgenomes. We further locally identified genomic segments of non-modern human origins by using the S* statistic38 and ArchaicSeeker, a Local-AncestryInference methoddevelopedinthisstudy.Theprincipleofthismethodis Tounderstandthegeneticmakeupofsomeinterestingregionson based on the high divergence between modern humans and afinerscale,wedevelopedamethodonthebasisoftheresultsof archaichominins38–40andthefindingthatarchaichomininintro- ChromoPainter(cp)45toobtainthefine-scaleancestrymakeupof gressions are absent from sub-Saharan Africans.41 We defined agivengenomicregion.ToinferthelocalancestriesofTIBwith ‘‘E-allele’’ as an allele absent from Africans but observed in Eur- cp,weselectedHanChinesegenomesandsomeavailablearchaic asians.38–40Insimulations,wefoundthattheE-allelerate(defined genomes,includinganAltaiNeanderthalgenome14andaDeni- asthenumberofE-allelesonasegmentdividedbythenumberof sovan genome,15 as reference data. Moreover, because we polymorphicsitesintheregionspanningthesegment)ismuch observed that Tibetans shared a considerable proportion of higheronarchaicintrogressivesegmentsthanonmodernhuman ancestry with Siberian populations, we also included the Ust’- segments. We further implemented a hidden-Markov-model- Ishim as a reference.16 In addition, to avoid the false-positive based method to partition each Eurasian chromosome into seg- inference of archaic ancestral segments, we included full ments with different E-allele rates and labeled each segment sequencedataofYRIfromphase3ofthe1000GenomesProject with one of the two hidden states. One hidden state represents inourreferencepopulationpanel.Withthisreferencepanel,we segments that are of modern human origin and have a lower classified the local genomes into six categories regarding E-allele rate, and the other one represents segments that are of their ancestry: Denisovan-like, Neanderthal-like, Ust’-Ishim-like, archaichomininoriginandhaveahigherE-allelerate.Theprocess Han-Chinese-like,African-like,anduncertainancestry. ofthealgorithmwasasfollows.Wefirstsetapenaltyforstatetran- sitions(l)ofthealgorithmandinitialE-alleleratesfortwostates. TheinitialE-alleleratesaretheobservedE-alleleratesonmodern AnalyzingtheCorrelationbetweenAncestry Eurasianandarchaichominingenomes.Second,weimplemented andAltitude the classic Viterbi algorithm to partition the Eurasian chromo- Toinvestigatewhethertherecouldbeanyrelationshipbetweenthe somes into segments and labeled each segment with a hidden ancestralcompositionofindividualsandthealtitudewherethey state. Third, with segments from step two, we re-estimated the reside,wegroupedTibetanindividualsaccordingtogeographical E-allele rate for both hidden states. Fourth, we repeated steps regions and calculated the correlation between percentages of two and three until convergence. In the Viterbi algorithm, we non-modernhumansequencesandaltitude.Inthisanalysis,Han usedthesameinitialprobabilityforeachstate.Wesetthetransi- ChineseandSherpasampleswerenotincluded,althoughamuch tion probability between two states as e(cid:2)l and set the emission strongercorrelationwouldbeexpectediftheywere.TheTibetan probabilityofbothstatesastheirE-allelerates. samplesweregroupedintosevenregionalpopulationsaccording After we partitioned each chromosome into segments and to the information provided in Table S1; Shigatse samples were labeledahiddenstateforeachsegment,weselectedsegmentsin grouped into two populations because three individuals were 4 TheAmericanJournalofHumanGenetics99,1–15,September1,2016 Pleasecitethisarticleinpressas:Luetal.,AncestralOriginsandGeneticHistoryofTibetanHighlanders,TheAmericanJournalofHuman Genetics(2016),http://dx.doi.org/10.1016/j.ajhg.2016.07.002 Figure 1. Genetic Affinities of TIB in the A ContextofWorldwidePopulations Africa (A)Afan-likechartshowinggeneticdiffer- WestEurasia ences (F ) between TIB and worldwide ST SouthAsia populations. Each branch represents a 0.12 Oceania comparisonbetweenTIBand 1ofthe256 EastAsia 0.09 populations; lengths are proportional to CentralAsiaSiberia the F values, indicated by gray circles. 0.06 America ThepSoTpulationsareclassifiedbygeograph- ical regions and indicated with the colors 0.03 shown in the legend. The populations in each region are presented in a clockwise TIB order according to great-circle distance to Tibet. (B)Afan-likechartshowingF betweenTIB ST andEastAsianpopulations. (C) A fan-like chart showing F between ST TIB and Central Asian and Siberian populations. p u k B Atayal C Sel _Pamir 0.08 Tajiks p Han_KNHoXorAeirNbTtaNuhaojDinYxaaiuizruHaTnuHezhen0000T....MiaozuI0000B1234JapaneOroqeKninShheLCADaTUmaihahumyiagbuiordian TKuAyrvlgUtianyYlziicaKaaihkaanEniluYnvmtuMekyaongknhigrolUzbek000T...TubaIla000rB246 TurkmenMansiChukNcEghsKiIakAitonemlrlaeymosuaaetkninan mtnwgwieuohhanncteeolesrremdeeoatwimticdmdoaiHbaterTbhubeMmi.g¼dstiTihRoflth(cid:2)feaCnhebidesrAfwHoe3Amun¼lrniommtccuThHc2e(cid:2)tuaHua3llAmlubteamnieCnmmcobtHh(cid:2)ganwueuibtmHmltehrou3apCeamwthntliCiaemo:dlbhatnpeicbmhfn;arpenroo;amthbteeuespmoeotsfahstniiae-- s e reference genome and that of the most recent common ancestor of humans and fromaratherdifferentaltitude(Dingri,4,300m).Therefore,seven chimpanzees of region ab. THum (cid:2) HumChimp is the divergence regional populations—Lhasa (3,650 m), Nyingchi (3,000 m), time of humans and chimpanzees, and 13 million years was Chamdo(3,240m),Shannan(3,573m),Shigatse(3,853m),Nagqu usedinthisstudy. (4,522 m), and Dingri (4,300 m)—were used for this analysis. The altitude for each region was averaged over the number of individuals residing in a given region. Accordingly, the Results proportion of sequences from each category (non-modern human sequences, Neanderthal-like sequences, Denisovan-like GeneticAffinitiesofTIBintheContextofGlobal sequences, Neanderthal-specific sequences, Denisovan-specific Populations sequences, and unknown archaic sequences) was averaged Acomparisonofgeneticdifferencesmeasuredbyunbiased over the number of individuals in the same region. Pearson’s F 19betweenTIBand188worldwidepopulations17(Tables correlation coefficient (R) and coefficient of determination (R2) ST S3andS4)showedthatTIBismostcloselyrelatedtoEast werecalculatedonthebasisoftheestimatedancestryproportions Asianpopulations(Figures1Aand1B),followedbyCentral andaltitudedata. AsianandSiberianpopulations(Figure1C).Theserelation- shipsremainedconsistentinseparateanalysesofTBNand EstimatingTMRCA SHPsamples(FiguresS2andS3).Inaddition,theseresults ToestimatetheTMRCAofagroupofsequences(orhaplotypes)ina werealsoconfirmed by PCA (Figure 2and Figures S4 and certainregion,wefirstcalculatedtheaveragepairwisenucleotide differences of these sequences ðpÞ according to the following S5) and outgroup f3 statistics21 (Figure S6). Interestingly, formula: TBNandSHParedistinguishableinatwo-dimensionalPC P P plot(Figure2D),indicatingthattheyaretwodistinctgroups p¼2 3 ni¼(cid:2)11 nj¼iþ1pij; withgeneticdifferences(FST¼0.010),slightlysmallerthan ðn þ 1Þ 3 n thosebetweenTBNandHAN(F ¼0.011).Theoverallge- ST wherenisthenumberofsequencesanalyzedinagivengenomic netic makeup of TIB is closer to surrounding populations region ab, and pij is the nucleotide difference between two se- livingintheplateau,suchasTu(FST¼0.006),Yizu(FST¼ quences i and j (i s j). The TMRCA can be estimated with the 0.007),andNaxi(FST¼0.009),thantoanyotherpopula- followingformula: tionworldwide(Figure1BandTablesS7andS8),mostlikely TheAmericanJournalofHumanGenetics99,1–15,September1,2016 5 Pleasecitethisarticleinpressas:Luetal.,AncestralOriginsandGeneticHistoryofTibetanHighlanders,TheAmericanJournalofHuman Genetics(2016),http://dx.doi.org/10.1016/j.ajhg.2016.07.002 A B Figure 2. Analysis of the First Two PCs of TIB Individuals and Other Population Samples Geographicalregionswheretheindividuals are located are indicated with the colors showninthelegend.Numbersinbrackets denotethevarianceexplainedbyeachPC. The HAN samples are classified into East Asianpopulationsandarenothighlighted inthisillustration. (A) In this two-dimensional PC plot, TIB samplesarelocatedcloselyintheEastAsian cluster and share similar PC coordinates with some populations in South Asia and CentralAsiaandSiberia. (B) PCA of TIB individuals and Eurasian samples (western Eurasian samples were C D excluded). (C) PCA of TIB individuals and some East Asiansamples. (D) PCA of TBN and SHP individuals and samplesofsomecloselyrelatedsurrounding populations(Tu,Yizu,andNaixi). example, the frequency of B5 in TBN (0%) was quite different from that in HAN (10.26%) (Table S6). Some hap- logroups, such as A and M5, also showed substantial differences be- tweenSHPandTBN;however,because asaresultofdirectancestrysharingorreciprocalgeneflow samplesizes,especiallythatofSHP,inourdataweresmall, betweenthesepopulations. estimationofhaplotypefrequencyisnotreliable. Wecautionthattherelationshiprevealedbyanoverall TIBshowedanoveralllowergeneticdiversitythanHAN, analysis of the genome-wide data only reflects a limited includingalowerlevelofheterozygosity,ahigherlevelof aspectoftheoriginsofthepopulationsstudied.Theresults runsofhomozygosity(FigureS8),andaconsistentlysmaller should not be directly interpreted as the genetic origins effectivepopulationsizeoverthelast~15,000years(Figures andhistoryoftheTibetans;rather,theyreflectthepopula- 3and4A),mostlikelybecauseTIBisisolatedontheTibetan tion’s present-day genetic makeup, which could be sub- Plateau,whereas HAN has experienced recent population stantiallyinfluencedbyrecentgeneflowfromsurrounding expansion. The average time of divergence between TBN populations. This highlights the need to account for andHANwasestimatedtobe~15,000–9,000YBPontheba- admixturewheninferringpopulationhistory. sisof a sequentially Markovian coalescent analysis24,25 of high-coverage genomes. This divergence time between GeneticRelationshipandDivergencebetweenTIB TibetansandHanChineseisthusmuchearlierthanthees- andHAN timateof2,750yearsagobyapreviousstudy.4Incontrast, Among the lowland populations, the data indicated that theestimateddivergencetimebetweenSHPandHANwas HanChinesearemostcloselyrelatedtoTIB(F ¼0.011) ~16,000–11,000YBP,andthatbetweenTBNandSHPwas ST (Figure1BandTableS7);thetwopopulationshadanover- ~11,000–7,000 YBP (Figure 4B). The divergence between all strong correlation in the allele-frequency spectrum TIB and HAN most likely resulted from recent migration (Figure S7), but Y chromosome and mtDNA data showed to the Tibetan Plateau after the Last Glacial Maximum substantial differences between them (Tables S5 and S6). (LGM), a period of intense cold from ~26,500 to 19,000 Y chromosome haplogroup D-M174 is an East-Asian-spe- YBP.49 Subsequent gene flow between TBN and HAN or cific male lineage and is rare in populations from regions continuous migrations to the plateau most likely caused borderingEastAsia(CentralAsia,NorthAsia,andtheMid- thedivergencetimebetweenTBNandHANtobeslightly dleEast),usuallylessthan5%.6,46–48Inourdata,D-M174 shorterthanthatbetweenSHPandHAN. lineages were not observed in the 39 HAN samples (fre- quency was 0) but had a high frequency in TIB (66.6%; AncestralMakeupofTIB Table S5), indicating a very large difference in Y DNA AncestryanalysiswithADMIXTURE32suggestedthatpre- lineages between the two groups. Such a large difference sent-day Tibetans share the majority of their ancestry was not observed in mtDNA, but some haplogroups also makeup with populations from East Asia (~82%), Central showed considerable differences between groups. For Asia and Siberia (~11%), and South Asia (~6%) and have 6 TheAmericanJournalofHumanGenetics99,1–15,September1,2016 Pleasecitethisarticleinpressas:Luetal.,AncestralOriginsandGeneticHistoryofTibetanHighlanders,TheAmericanJournalofHuman Genetics(2016),http://dx.doi.org/10.1016/j.ajhg.2016.07.002 5−10 kyr 50−100 kyr 0.5−1 Myr 5−10 Myr A 310) 4 × HAN Ne3.5 TBN µ4 SHP of 3 s unit 2.5 n d i 2 e al sc 1.5 e ( siz 1 n atio 0.5 ul op 0 P 10−5 10−4 10−3 10−2 Time (scaled in units of 2µT) Figure 3. Estimated Changes in Effective Population Size B overTime The estimation was based on PSMC analysis of single genomes from three modern human groups with deeply sequenced ge- nomesinthisstudy(33TBN,5SHP,and39HAN).Toovercome uncertaindateestimatesdue toa lack ofconfidencein thetrue valueofthemutationrate,wescaledtimeinunitsof2mT(where misthemutationrateandTistimeingenerations).Wealsopro- videanabsolute estimationoftime(top)undertheassumption ofafastmutationrateof1.0310(cid:2)9oraslowmutationrateof 0.5310(cid:2)9persiteperyear. minorancestralrelationshipswithwesternEurasian(<1%) and Oceanian (<0.5%) populations (Figure 5 and Figures S9–S11). In contrast, HAN share much less ancestry with Siberian (~7%), South Asian (<0.5%), and Oceanian (~0%) populations but higher ancestry with East Asian populations(>90%). Figure4. EstimatedChangesinHistoricalEffectivePopulation SizeandDivergenceTimebetweenPopulations Weappliedoutgroupf statistics21andf statistics17,34to 3 4 (A)TheestimationwasbasedonMSMCanalysisoffourgenomes assessancientancestralcontributionstoTIBbyanalyzing (eighthaploidgenomes)randomlyselectedfromeachofthethree available ancient genomes, including those of an Altai modern human groups with deeply sequenced genomes in this Neanderthal,14 a Denisovan,15 and Ust’-Ishim, a 45,000- study (TBN, SHP, and HAN). The analysis was repeated twice with different combinations of different individuals, and two year-old anatomically modern human from Siberia16 (see curves are displayed for each group in the plot with different Material and Methods). TIB was found to share slightly colors.Curveswithdashedlinesdenotetheestimationbasedon morealleleswithancientgenomesthanmanyworldwide thesamesetofgenomesbutwithnon-modernhumansequences removed(AMHonly). populations, except for Oceanian populations, which (B)TheestimationwasbasedontheMSMCalgorithm.Twoindi- showed significantly higher shared archaic (both Deniso- vidualgenomeswererandomlyselectedfromeachgroupinthe van- and Neanderthal-like) ancestry than TIB, and East group pairs (TBN versus HAN, SHP versus HAN, or SHP versus Asianpopulations(especiallythosesurroundingTIB:Yizu, TBN), and in total four genomes (eight haploid genomes) were used for each MSMC analysis. The analysis was repeated eight Tu, andNaxi),whichhadsimilarorevenhigher levelsof times,andeightcurvesaredisplayedforeachgroupintheplot sharedarchaicancestrythanTIB(TablesS9,S10,S11,S12, withdifferentcolors.Here,weshowtheresultsbasedonabsolute S13,S14,andS15andFiguresS12–S26).Inparticular,no estimationoftimeundertheassumptionofaslowmutationrate modern human populations shared more alleles with of0.5310(cid:2)9persiteperyear. Ust’-IshimthanTIB(TableS13andFigureS14).Although theexactestimatesvariedandtheabsolutevaluesarenot ofnon-AMHsequencesintheTIBgenepool(6.17%)was comparable, these genome-wide analyses indicate that significantly higher than that in the HAN gene TIBsharesconsistentlyhigherancestrywithancientnon- pool(5.86%)(p<10(cid:2)5)(Figures6AandFigureS27A).Re- AMH genomes than with HAN, the lowland population stricting the comparison to Neanderthal-like and Deniso- thathastheclosestmodernrelationshipwithTIB. van-like sequences, we did not observe any significant differencesbetweenTIBandHANasapercentageofNean- IndividualArchaicAncestryIsHigherinTIBThan derthal-likesequencesperindividual(Figure6B),although inHAN we did observe some marginally significant differences Genomic segments of non-AMH origins were identified whenwerestrictedthecomparisontoNeanderthal-specific with the S* statistic,38 as well as two methods developed sequences (p ¼ 0.01; Figure S27B). Significant differences inthisstudy(seeMaterialandMethods).Thetotalamount werealsoobservedbetween TIB andHAN onthe basisof TheAmericanJournalofHumanGenetics99,1–15,September1,2016 7 Pleasecitethisarticleinpressas:Luetal.,AncestralOriginsandGeneticHistoryofTibetanHighlanders,TheAmericanJournalofHuman Genetics(2016),http://dx.doi.org/10.1016/j.ajhg.2016.07.002 A a LBNiethUolauIrkOcrwaruearnesclHnaSigaiaCauinciEaFdannnzdoSCrninngeeatipgcrtnacnaioCslFcrhinaaBshihriantseha_ianhanSsr_nqcyoNS_huuIpoeIsttharBlaatnluIhintaialsgdnlihaea_rnrNia_oSnArotlhbuatThnuisacnsahGnkyreeSneiackzilyia_SnJaerwdiniManaltesCeypriot 11.63.Karitian1%%SuruiWayuuTicunaChanePiapocoCabecarMixeGuaranKiaqchAiykelmaPriamaMixteZcaIpnotgeacQueBcolhiMuviaaayCnahilCohtiAelpgOejoiCwnbryTqelawiueniaAnlnEgeistCutkKiihItoaumMrenklYyoacNmEuaDhnYikkgvSeTsaoaeianuelkgnnglbuhkaaiatusrnlpaanr Estonian 81.6% Altaian Finnish Tuvinian Mordovian Ulchi Russians_North TIB Oroqen Chuvashe Hezhen Armenian Daur Georgian Xibo GeorAgNiboaCkrnthhYAh_eeaOdJLmcsKeyeshieBuagwszenmanheigKntlNtYeiikyeanealoak_BmmngrJIeSreaeydiaankwouqPeiuadisl_iIneeJrBDsareBetinudiwnzioaJeaunoirn_nLdJeAaebniwTauanrnekiSssyrehi_aJInreawniTuarnkiMsakhraBnirahBuailochKialashPathanSi7ndhiG.uja0ratiA%BurushoCochin_JewTiwari9GujaratiB2.GujaratiC3%MalaGujaratiDLodhiPunjabiVishwHabrahBenAgaliKNhariaOngeKusunHdaazarBaougAauinsvtPirlalaelipaunTaanjiTkus_rUkPmzabmeKeniyrkUrgyyAgzutiaAarnymSailhDeaMiHiaTaounKzjuiLianaKhhoHHurTaeAYInNaBNi_zJnaTCTNuaxMuhpaoiaarmotinhnbegosodelian m in Figure5. SummaryPlotofGeneticAdmixture Theresultsofindividualadmixtureproportionsestimatedfrom592,799autosomalSNPswithgenotypedataavailablefor38TIB,39 HAN,and2,345HuOriginsamples(Africansampleswerenotincluded).Eachindividualisrepresentedbyasinglelinebrokeninto K¼7coloredsegmentswithlengthsproportionaltotheK¼7inferredclusters.ThepopulationIDsarepresentedoutsideofthecircle oftheplot.Theresultsofpopulation-leveladmixtureofTIBandHANaresummarizedanddisplayedinthetwopiechartsinthecenterof thecircleplot;admixtureproportionsaredenotedaspercentagesandwithdifferentcolors. Denisovan-like sequences per individual (p ¼ 0.001; tion(see Material and Methods), a strong correlation was Figure 6C), and this difference was more pronounced observed between the average proportion of non-AMH whenthecomparisonwasrestrictedtoDenisovan-specific sequences present in Tibetan individuals and altitude sequences(p<10(cid:2)5;FigureS27C).Together,thisanalysis (PearsonR2¼0.855;p¼0.0083;Figure6D),butnosignif- suggeststhatthemajordifferencesbetweenTIBandHAN icantcorrelationwasobservedinrelationtoNeanderthal- resultedfromthecontributionofDenisovan-likeandun- likeorDenisovan-likesequences(Figures6Eand6F).These knownnon-AMHsequences(Figure6andFigureS27). results indicate that non-AMH sequences have some association with altitude, possibly contributing to the AltitudinalCorrelationofArchaicAncestry adaptation of Tibetans, but the main contribution seems Interestingly,whentheTibetansamplesweregroupedinto to originate from unknown non-AMH ancestry sevenregionalpopulationsaccordingtogeographicalloca- (Figure S27D) rather than from Neanderthal-like or 8 TheAmericanJournalofHumanGenetics99,1–15,September1,2016 Pleasecitethisarticleinpressas:Luetal.,AncestralOriginsandGeneticHistoryofTibetanHighlanders,TheAmericanJournalofHuman Genetics(2016),http://dx.doi.org/10.1016/j.ajhg.2016.07.002 A D Figure 6. Distribution of Non-modern Human Sequences among Individuals and CorrelationswithAltitude (A)Acomparisonofthepercentageofnon- modernhumansequencesperindividualbe- tween TIB (6.17 5 0.10) and HAN (5.86 5 0.07) demonstrates significant differences (p < 10(cid:2)5). On the horizontal axis, T, S, andHdenoteTibetan,Sherpa,andHanChi- nese,respectively. (B)AcomparisonofthepercentageofNean- B E derthal-like sequences per individual be- tween TIB (1.04 5 0.04) and HAN (1.02 5 0.04) shows significant differences (p ¼ 0.379). On the horizontal axis, T, S, and H denote Tibetan, Sherpa, and Han Chinese, respectively. (C)AcomparisonofthepercentageofDeni- sovan-likesequencesperindividualbetween TIB (0.42 5 0.02) and HAN (0.40 5 0.02) shows slightly significant differences (p ¼ C F 0.001); the statistical significance (p value) was obtained by permutationtests repeated 100,000 times.Onthehorizontal axis,T,S, andHdenoteTibetan,Sherpa,andHanChi- nese,respectively. (D)Correlationbetweentheaveragepropor- tion of non-modern human sequences in regional Tibetan populations and altitude (PearsonR2¼0.855;p¼0.0083). (E)Correlationbetweentheaveragepropor- tionofNeanderthal-likesequencesperindi- vidual and altitude (Pearson R2 ¼ 0.216; p¼0.354). (F)Correlationbetweentheaveragepropor- tionofDenisovan-likesequencesperindivid- ual and altitude (Pearson R2 ¼ 0.412; p¼0.170). Denisovan-like sequences. A similar pattern and conclu- some2,encompasseseightgenes(EPAS1,LOC101805491, sion was apparent when the analysis was restricted to TMEM247, ATP6V1E2, RHOQ [MIM: 605857], Neanderthal-specific and Denisovan-specific sequences LOC100506142, PIGF [MIM: 600153], and CRIPT [MIM: (FiguresS27EandS27F). 604594]), and has extreme differences between TIB and HAN (F > 0.65) for SNVs with a high frequency of ST AHighlyDifferentiatedGenomicRegionwith derived alleles. These differences are in sharp contrast to ElevatedArchaicAncestryinTIB thegenome-wideaverage(F ¼0.011). ST Theaboveresultsrevealthatnon-AMH-derivedsequences not only exist in the Tibetan genomes but are also very EntangledAncestriesandTheirAncientOriginsin common. However, it is not clear whether these non- the~300kbRegion AMH sequences derived from recent migrations to Tibet The ancestral pattern in the ~300 kb region is extremely withbothAMHandnon-AMHancestriesorwereinherited complicated, such that it contains a mix of Denisovan, fromtheancientcolonizationoftheTibetanPlateau.The Neanderthal, ancient Siberian, and unknown archaic an- overall spatial distributions of non-AMH-derived se- cestries,whichareelevatedinTIB(TableS16andFigure7). quences are similar between TIB and HAN (Figure S28), The unusually high frequency of archaic sequences and andtheTMRCAfornon-AMHsequencesinbothTIBand substantial differences between TIB and HAN—as well as HANwasestimatedtobe>40,000years(seeMaterialand other worldwide populations—cannot be explained by Methods). These results might be not surprising, given recent gene flows or incomplete linage sorting (Figure 7 that TIB and HAN share recent genetic drift (Figures 1 and Table S17). These highly differentiated sequences and2)andthemajorityoftheirgeneticmakeup(Figure5). harbored in the genomes of present-day Tibetan high- Nonetheless, when frequency was taken into account, a landers were most likely ‘‘inherited’’ from earlier settlers ~300kbregionwasfoundtobeconsiderablydifferentbe- rather than ‘‘introgressed’’ by later arrivers. The compli- tween TIB and HAN, such that both Denisovan-like and cated ancestral architectures in these regions have many Neanderthal-like ancestries were significantly elevated in implications for Tibetan origins and their pre-history. TIB(FigureS29).This~300kbregionislocatedonchromo- The surviving archaic sequences in the ~300 kb region TheAmericanJournalofHumanGenetics99,1–15,September1,2016 9 Pleasecitethisarticleinpressas:Luetal.,AncestralOriginsandGeneticHistoryofTibetanHighlanders,TheAmericanJournalofHuman Genetics(2016),http://dx.doi.org/10.1016/j.ajhg.2016.07.002 Figure 7. Local-Ancestry Inference of a~300kbRegioninTIBandEightOther Populations The local ancestry was jointly inferred with ArchaiSeeker and ChromoPainter45 (see Material and Methods). The upper panelshowsgeneslocatedinthe~300kb region (chr2: 46,577,796–46,870,806). The two vertical arrows indicate the two previously identified Tibetan-specific varianttags,thefive-SNPmotif,7andthe Tibetan-enriched deletion (TED).50 The positionsofancestry-informativemarkers (AIMs)fortheDenisovan(DENAIM),the Neanderthal (NEA AIM), and Ust’-Ishim (UST AIM) are also displayed in the following rows. For the AIMs of ancient samples identified here (DEN, NEA, and UST), we required the frequency of the ancient samples’ alternative allele (with respect to reference genome GRCh37) to be >0.5 in TIB and <0.1 in other world- wide populations. The lower panels exhibittheinferredancestryofhaplotypes inTIB,HAN,CHB(HanChineseinBejing, China; 1000 Genomes), CHS (Southern Han Chinese; 1000 Genomes), JPT (Japa- nese in Tokyo, Japan; 1000 Genomes), PAP (Papuan),51 SIB (Siberians),51–53 SSIP (Singapore Indians),54 and SSMP (Singapore Malay).55 Each row represents a haplotype with ancestry derived from Neanderthals14 (red), Ust’-Ishim16 (orange), Denisovans15 (blue), or Han Chinese (green) or uncertain ancestry (including unknown archaic and uncer- tainmodernhuman;gray). groups before the post-LGM arrivals. We suggest that SUNDer, an un- known ancient group of multiple ancestral origins, introduced the ancient haplotypes that are unique to present-day Tibetan highlanders (Figure8). ATwo-Wave‘‘Admixtureof Admixture’’Model Our data support that the Tibetan genomeappearstohavearisenfroma mixture of multiple ancestral gene pools, but the ancestral composition ismuchmorecomplicated,anditshis- tory can be traced back considerably earlierthan previously suspected.For trace their ancestries back to ~62,000–38,000 years ago instance,arecentstudyhassuggestedthat‘‘Tibetansarea (Table S17), pre-dating the LGM49 and indicating that mixtureofancestralpopulationsrelatedtotheSherpaand thecolonizationofhumansinTibetismuchmoreancient Han Chinese.’’56 We propose a two-wave ‘‘admixture of thanpreviouslythought. Themixedarchaicancestriesin admixture’’ (AoA) model, despite its simplicity, to help this ~300 kb region and simulation studies indicate that explain the ancestral makeup and pre-history of Tibetans TibethasbeenahumanmeltingpotsincethePaleolithic and Sherpas (Figure 9). An ancient wave of admixture age, despite its inhospitable environment and the fact occurredinapre-LGMera>40,000YBP,whichcouldhave that interbreeding occurred among different hominin resulted in the unique mosaic pattern of Paleolithic 10 TheAmericanJournalofHumanGenetics99,1–15,September1,2016

Description:
(303–603) genomes of 38 Tibetan highlanders and 39 Han Chinese lowlanders, Group on Population Genomics, CAS-MPG Partner Institute for Computa- The American Journal of Human Genetics 99, 1–15, September 1, 2016 1 .. the classic Viterbi algorithm to partition the Eurasian chromo-.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.