INVESTIGATION Higher Levels of Neanderthal Ancestry in East Asians than in Europeans JeffreyD.Wall,*,†,1MelindaA.Yang,‡FloraJay,‡SungK. Kim,*,2EricY. Durand,‡,3Laurie S.Stevison,* ChristopherGignoux,*AugustWoerner,§MichaelF.Hammer,§andMontgomerySlatkin‡ *InstituteforHumanGeneticsand†DepartmentofEpidemiologyandBiostatistics,UniversityofCalifornia,SanFrancisco,California 94143,‡DepartmentofIntegrativeBiology,UniversityofCalifornia,Berkeley,California94720,and§DepartmentofArizona ResearchLaboratories,UniversityofArizona,Tucson,Arizona85721 ABSTRACTNeanderthalswereagroupofarchaichomininsthatoccupiedmostofEuropeandpartsofWesternAsiafrom(cid:1)30,000to 300,000 years ago (KYA). They coexisted with modern humans during part of this time. Previous genetic analyses that compared adraftsequenceoftheNeanderthalgenomewithgenomesofseveralmodernhumansconcludedthatNeanderthalsmadeasmall(1– 4%)contributiontothegenepoolsofallnon-Africanpopulations.Thisobservationwasconsistentwithasingleepisodeofadmixture fromNeanderthalsintotheancestorsofallnon-AfricanswhenthetwogroupscoexistedintheMiddleEast50–80KYA.Weexamined the relationship between Neanderthals and modern humans in greater detail by applying two complementary methods to the published draft Neanderthal genome and an expanded set of high-coverage modern human genome sequences. We find that, consistentwiththerecentfindingofMeyeretal.(2012),NeanderthalscontributedmoreDNAtomodernEastAsiansthantomodern Europeans.FurthermorewefindthattheMaasaiofEastAfricahaveasmallbutsignificantfractionofNeanderthalDNA.Becauseour analysis is of several genomic samples from each modern human population considered, we are able to document the extent of variation in Neanderthal ancestry within and among populations. Our results combined with those previously published show that a more complex model of admixture between Neanderthals and modern humans is necessary to account for the different levels of Neanderthalancestryamonghumanpopulations.Inparticular,atleastsomeNeanderthal–modernhumanadmixturemustpostdate theseparationoftheancestorsofmodernEuropeanandmodernEastAsianpopulations. NEANDERTHALSwereagroupofarchaichomininsthat etal.1999;Trinkaus2007).InitialcomparisonsofNeander- occupied large parts of Europe and West Asia from thalandmodernhumanDNAfoundnoevidenceforaNean- (cid:1)30,000 to 300,000 years ago (KYA) (Stringer and Hublin derthal contribution to the modern human gene pool 1999;Hublin2009).Theirdisappearanceinthefossilrecord (Krings et al. 1997; Serre et al. 2004; Noonan et al. 2006). often coincides with the first appearance of anatomically However, indirect studies of patterns of linkage disequilib- modern humans (AMH) in that region (Finlayson 2004). rium(LD)incontemporaryhumanpopulationshaveconsis- Where, when, and how often Neanderthals interbred with tently found support for admixture between “archaic” expanding AMH populations is still an open question. Mor- humangroups(suchasNeanderthals)andmodernhumans phological studies have generally concluded that Neander- (Garriganetal.2005a,b;PlagnolandWall2006;Walletal. thals made little or no contribution to present-day human 2009; Hammer et al. 2011; Lachance et al. 2012). populations (Stringer and Andrews 1988; Lahr 1994), but A detailed analysis of a draft Neanderthal genome and others have suggested there was some admixture (Duarte five low-coverage (4·) human sequences estimated that Neanderthals made a 1–4% contribution to the gene pool ofmodernnon-Africanpopulations(Greenetal.2010).The Copyright©2013bytheGeneticsSocietyofAmerica doi:10.1534/genetics.112.148213 presence of “Neanderthal DNA” in East Asians and Melane- ManuscriptreceivedNovember30,2012;acceptedforpublicationFebruary4,2013 sians was initially surprising because the archaeological re- Supportinginformationisavailableonlineathttp://www.genetics.org/lookup/suppl/ cord shows that Neanderthals and early modern humans doi:10.1534/genetics.112.148213/-/DC1. 1Correspondingauthor:513ParnassusAve.,S965,SanFrancisco,CA94143. coexisted only in Europe and western Asia. Green and E-mail:[email protected] colleagues hypothesized that Neanderthals and modern 2Presentaddress:Sequenom,Inc.,SanDiego,CA92121. 3Presentaddress:23andMe,MountainView,CA94043. humanscameintocontactandinterbredintheMiddleEast Genetics,Vol.194,199–209 May2013 199 (cid:1)50–80 KYA, prior to the divergence of modern-day Euro- pean and Asian populations. Green et al. (2010) presented three kinds of evidence in favor of interbreeding. First, they found (using D-statistics, a new measure of genetic similarity introduced in that arti- cle) that the three sampled non-African genome sequences (fromaFrench,aHanChinese,andaPapuaNewGuinean) aremoresimilartotheNeanderthalsequencethaniseither ofthetwosampledAfricansequences(fromaSanandaYor- uban).Second,theyidentifiedseveralhaplotypesthatarein lowfrequencyinEuropeans,absentfromAfricans,andpres- entintheNeanderthalsequence,whichsuggeststhosehap- lotypes were derived from Neanderthals. Third, they found manymoregenomicfragmentsinaEuropeangenomethan in an African genome that have low divergence to the Ne- anderthal genome. Admixture between modern humans and Neanderthals within the past 100,000 years (Kyr) is only one possible explanationfortheseD-statisticpatterns.Greenetal.noted thatanotherpotentialexplanationisancientpopulationsub- divisionwithinAfricabeforebothNeanderthalsandmodern humans left Africa (cf. Green et al. 2010, figure 6). If there had been long-lived (e.g., .500 Kyr) population structure withinAfrica,andbothNeanderthalsandnon-AfricanAMH camefromthesame“source”subpopulation,thenNeander- thals would be more similar to non-Africans in the absence of any recent admixture between AMH and Neanderthals (see Figure 1A). This intuitive argument was confirmed by thesimulationstudiesofDurandetal.(2011)andEriksson andManica(2012),butthesestudiesdidnotaccountforthe other two lines of evidence summarized above. Two other studies have shown that the ancient-subdivision model is incompatible with other aspects of the data. Yang et al. Figure1 Simplifiedversionsofmodelsofancientpopulationstructure(A) (2012) demonstrated that recent admixture (Figure 1B) orrecentadmixture(B)thatcanexplaintheobservedlevelsofdivergence could be distinguished from ancient subdivision (Figure between modern human genomes and the draft Neanderthal genome. 1A) by computing the frequency spectrum of modern Here T1 is the time when Neanderthals and modern humans first split, humans, conditioned on the Neanderthal sequence having T2isthetimewhenAfricanandnon-Africanmodernhumanpopulations split,andT3isthetimewhenNeanderthalsmixedwithmodernhumans. the derived allele and an African sequence having the an- cestral allele. This double conditioning enriches for alleles introduced by recent admixture if it occurred. Yang and D-statistics among pairs of individuals from the same two colleagues found that the doubly conditioned frequency populationsandobtaingreaterstatisticalpowerbycombin- spectruminEuropeansandinEastAsiansisconsistentwith ingestimatesamongallpairs.ThesecondmethodisanLD- recent admixture, not with ancient subdivision. Separately, basedmethodsimilartooneintroducedbyWall(2000)and an analysis of the extent of LD at closely linked sites also Plagnol and Wall (2006) for identifying putatively intro- concludedthatthedatawereconsistentwithrecentadmixture gressed regions in modern human genomes. We use the andnotwithancientsubdivision(Sankararamanetal.2012). draftNeanderthalgenometoidentifysegmentsinthemod- In this study, we revisit the question of Neanderthal ern humangenome thatwerederivedfromadmixture with admixture,usinganexpandeddatasetof42high-coverage Neanderthals. This method is similar to the one used by (.45·) modern human genomic sequences, and we take Greenetal.(2010)butislessrestrictiveandallowsquanti- advantage of the recent high-coverage Denisova genome fication of the differences in the number of admixed seg- (Meyer et al. 2012) to obtain more refined estimates of ments in different populations. admixture proportions. We use two complementary meth- Using both of these methods, we show there was more odsofanalysis.OneistheD-statisticmethodintroducedby Neanderthal admixture into East Asian populations than Green et al. (2010). D-statistics reflect site-by-site differen- into European populations. This conclusion is consistent ces.Becausewehavemultipleindividualsfromeachofsev- with that of Meyer et al. (2012), which was based on the eral populations, we can quantify the extent of variation in analysis of a smaller number of modern human sequences. 200 J.D.Walletal. Byusingthehigh-coverageDenisovagenome,weareableto Table 1 Forty-two individual genome sequences from Complete showthattheadmixturerateintoEastAsiansis40%higher Genomicsincludedinourstudy than into Europeans. We conclude that admixture between ID Population ID Population Neanderthals and modern humans did not occur at a single NA06985 CEU NA21732 MKK timeandplace,assuggestedbyGreenetal.(2010).Someofit NA06994 CEU NA21733 MKK had to have occurred after the separation of East Asians and NA07357 CEU NA21737 MKK Europeans.Further,weshowthattherewassignificantNean- NA10851 CEU NA21767 MKK NA12004 CEU NA18940 JPT derthal admixture into the Maasai population of East Africa, NA12889 CEU NA18942 JPT probablybecauseofsecondarycontactwithanon-Africanpop- NA12890 CEU NA18947 JPT ulationratherthanadmixturedirectlyfromNeanderthals. NA12891 CEU NA18956 JPT NA12892 CEU NA20502 TSI NA18526 CHB NA20509 TSI NA18537 CHB NA20510 TSI Materials and Methods NA18555 CHB NA20511 TSI Complete genomicsdata NA18558 CHB NA18501 YRI NA20845 GIH NA18502 YRI We downloaded data from 69 publicly available genome NA20846 GIH NA18504 YRI sequences from the Complete Genomics (CGI) website NA20847 GIH NA18505 YRI NA20850 GIH NA18508 YRI (http://www.completegenomics.com/public-data/).Com- NA19017 LWK NA18517 YRI plete Genomics sequenced a Yoruba (YRI) trio, a Centre NA19020 LWK NA19129 YRI d’Etude du Polymorphisme Humain (CEPH)/Utah (CEU) NA19025 LWK NA19238 YRI pedigree family of 17 family members, a Puerto Rican NA19026 LWK NA19239 YRI (PUR) trio, and a diversity panel from 10 different popula- tions.Combiningthesedatasetsandusingonlynonrelated, nonadmixed individuals, we have a sample size of 42 indi- inSeptember2011.Thisversionwasmappedtothehuman vidualsrepresentingninedifferentpopulations(Table1).In referencegenomehg19.Wealsodownloadedthechimpan- additionto36membersofthediversitypanel,wealsoused zeegenomepantro2alignedtohg19fromtheUniversityof theparentsfromtheYRItrioandthematernalandpaternal California, Santa Cruz (UCSC) Genome Browser (http:// grandparents in the CEU pedigree. The individual genomes hgdownload.cse.ucsc.edu/goldenPath/hg18/vsPanTro2/). were sequenced to a minimum 45-fold coverage (Drmanac The Neanderthal sequence was obtained by pooling reads et al. 2010). The eight populations are Utah residents with fromthethreeVindijabones(SLVi33.16,SLVi33.25,andSL Northern and Western European ancestry from the CEPH Vi33.26)thatwerealignedtothereferencehumangenome collection (CEU); Han Chinese from Beijing, China (CHB); (Green et al. 2010). The Neanderthal data were down- GujaratiIndiansfromHouston(GIH);JapanesefromTokyo loaded from the UCSC genome browser (http://genome. (JPT); Luhya from Webuye, Kenya (LWK); Maasai from ucsc.edu/Neandertal/). To match the filtering used in the Kinyawa, Kenya (MKK); Toscani from Italy (TSI); and Yor- original Green et al. (2010) study, we used only sites with uba from Ibadan, Nigeria (YRI). Samples from three other a mapping quality score (MAPQ) of at least 90 and a se- populations were also available from Complete Genomics, quencequality.40.Onaverage,thecoverageoftheNean- those of Mexican ancestry in Los Angeles (MXL), African- derthal genome was (cid:1)1.3-fold. We kept only sites that had Americans from southwest Arizona (ASW), and Puerto one, two, or three reads. Ricans from Puerto Rico (PUR), but these were excluded After filtering out any insertions, deletions, or ambigu- fromouranalysisbecauseofrecentintercontinentaladmix- ously called sites in the Complete Genomics data, we ture. All genomic data were downloaded from Complete merged them with the chimpanzee and Neanderthal Genomics’ ftp site (ftp://ftp2.completegenomics.com/). genomes. We kept only sites that had no more than two We used two separate pipelines for filtering and processing alleles in any of the human genomes and at which alleles the data, optimized for the different analyses performed were called for each human, the chimp, and the Neander- (see below). thal. Furthermore, we considered only transversion differences. D-statisticfiltering We also obtained the high-coverage Denisova genome For the D-statistic analyses, each individual genome was from Meyer et al. (2012). The genome was aligned to the aligned with the human genome assembly hg19 for consis- human referencegenome (hg19)andthe averagecoverage tency with the available assembly of the Neanderthal ge- was (cid:1)30x. We filtered out all sites that had ,16 reads or nome. Since our results were somewhat unexpected, we .46reads.Wemergedthesedatawiththedatafromanal- preparedthedataforanalysisintwodifferentwaystocheck ysis A to compute the D-statistic and f-statistic. for consistency. We denote these analysis A and analysis B. ForanalysisB,weredownloadedthegenomicdatafromthe For analysis A, we used the release of the file format Complete Genomics website (ftp://ftp2.completegenomics. version 2.0 (software version 2.0.0.26) that was generated com/, software version 2.0.2.15, file format version 2.0, NeanderthalAncestryinEurasians 201 February2012).Thesesequenceswerealignedtohg18.We regionsthatincludedsiteswhereoverhalfoftheindividuals applied a less stringent filter of the Neanderthal data: the are heterozygous and only one homozygous genotype is filteringformappingqualityandsequencequalityremained present.Thecoordinatesfortheseregionsareavailablefrom the same as in analysis A, but there were no restrictions the authors upon request. on the number of reads per site. Finally, instead of consid- Denisova sequence reads (Reich et al. 2010), mapped ering the chimp genome as the outgroup, we used the an- to the human reference genome hg18, were downloaded cestralallelesdefinedbythe1000GenomesProjectfromthe from the UCSC genome browser (http://genome.ucsc.edu/ Enredo-Pecan-Ortheus(EPO)pipeline(Patenetal.2008a,b) cgi-bin/hgTrackUi?db=hg18&c=chrX&g=bamSLDenisova). (data downloadedfrom ftp://ftp.1000genomes.ebi.ac.uk/). ConsensusNeanderthalsequencegeneratedfromthreebones We refer to this outgroup as the reconstructed common andalignedtothehumanreferencegenomehg18wasdown- ancestor (RCA). loadedfromtheEnsemblgenomebrowser(http://neandertal. For samples from any two populations compared, we ensemblgenomes.org/data_info.html). Samtools 0.1.18 (Li filtered out any insertions, deletions, or ambiguously called etal.2009)wasusedtoconverttheBAMfilesintoapileup sites. These genomic samples were then merged with the alignment (mpileup arguments: -B -q5 -Q30) of each an- Neanderthal genome and the RCA outgroup. This differs cienthominin genomeand hg18forthe region of interest. from analysis A, where all populations were merged with To compare modern human sequence tracks to ancient theNeanderthal,Denisova, andchimpgenomepriortoany hominin sequences, hg19 coordinates of interest were comparisonsbetweenpopulations.Weconsideredonlysites converted to hg18 coordinates using the UCSC genome where the difference between the ancestral allele from the browser tool liftOver and extracted from the pileup align- RCA and the alternate allele is a transversion, as we did in ments via custom perl scripts. To further compare the hu- analysis A. man sequences to sequences of other primate genomes, another custom perl script was used to extract the same LD-basedanalysis filters hg19 coordinates of interest from a subset of the genomes Since the LD-based analyses primarily utilize patterns of in the UCSC MultiZ alignments found at http://hgdownload. extantgeneticvariation(andonlysecondarilyusethedraft cse.ucsc.edu/goldenPath/hg19/multiz46way/.Computa- Neanderthal genome), we aligned variant calls to the tionswere performed using the University of California, updated human genome assembly (hg19), included both San Francisco, Biostatistics High-Performance Comput- transitions and transversions, and imposed more stringent ing System. filterstothrowoutrepetitiveregions.Specifically,acustom series of Perl/C scripts and cgatools v1.3.0.9 were used to D-statisticsandestimatesofadmixturerates get a common set of variants from each individual. Using the CGI’s variant file, all polymorphic regions containing D-statistics,introducedbyGreenetal.(2010),aresummary SNPs were identified and reconstructed according toCGI’s statisticsforgenomesequencesfromfourpopulations. Two descriptions. These regions were then filtered for SNPs in populations, P1 and P2, are compared to a test population, P . The fourth population P is used as an outgroup to de- such a way that both alleles were known for a given in- 3 4 terminewhichalleleisancestralateachsite.Inourcase,P dividualandwerenotpartofacomplexvariant(forexam- 4 isthechimpanzeereferencesequence(pantro2)denotedby ple, a SNP on one haploid phase and a deletion on the C,andP istheNeanderthalsequence,denotedbyN.P and other phase). We then pooled all unique SNP positions 3 1 P aretwohumansequences.Thechimpreferencesequence from the full panel of samples and removed all SNPs lo- 2 is assumed to have the ancestral allele, denoted by A. D is cated within repeats and segmental duplications with computed only for sites at which both of the Neanderthal a minimum size of 50 bp. Structural variants (dgv track on UCSC), self chain (identity ,90%, UCSC self-chain andonebutnotbothofthehumansequenceshaveadiffer- entallele,assumedtobederivedanddenotedbyB.Thatis, track), segmental duplications (UCSC), microsatellites only those sites with configurations ABBA and BABA are (UCSC), simple tandem repeats (UCSC), and repeat maskedsequence(UCSC)werealsoexcluded.Thefinallist used,wheretheorderisP1,P2,P3,P4.Therequirementthat ofSNPswasthenusedbyCGI’s“snpdiff”tooltoextracteach two copies of both the derived and the ancestral alleles be sample’sbase callsrelative tothe humanreference genome present greatly reduces the effect of sequencing error (Durand et al. 2011). (hg19, Build 37). The snpdiff output was then reformatted When only a single sequence from each population is to ms, PLINK, and other text-based formats for further available, analyses. Subsequently,weidentifiednumerousregionswhereall/ n 2n DðP ;P ;P ;P Þ¼ ABBA BABA; (1) most individuals had heterozygous SNP calls but only one 1 2 3 4 n þn ABBA BABA homozygous genotypewaspresent.These regions likely reflect either alignment errors due to the Complete wheren andn arethenumbersofsiteswitheachof ABBA BABA Genomics short-read sequencing technology or errors in the two configurations. When diploid sequences from each the human reference genome sequence. We excluded all individual from P and P are available, then 1 2 202 J.D.Walletal. P P ð12pð1ÞÞpð2Þ2 pð1Þð12pð2ÞÞ DðP ;P ;P ;P Þ¼ Pi i i P i i ; (2) 1 2 3 4 ð12pð1ÞÞpð2Þþ pð1Þð12pð2ÞÞ i i i i i where pð1Þ andpð2Þ are the frequencies ofthe derived allele i i (0,0.5,1)intheindividualinP andP ,respectivelyatsite 1 2 i. Equation 2 is equivalent to sampling one of the chromo- somesatrandomfromP andP andthenusingEquation1. 1 2 Greenetal.(2010)andDurandetal.(2011)showedthat theexpectedvalueofDis0ifP andP formacladeandP 1 2 3 istheoutgroup.Thesearticlesalsoshowedthatiftherewas admixturefromP intoP ,thenE(D).0.Themagnitudeof 3 2 D depends on the admixture proportion f and on the pop- ulation divergence times and various effective population Figure2 Schematicofamodelofrecentandancientpopulationstruc- turewithoutadmixtureusedinsimulations.Seetextfordetails. sizes. Reich et al. (2010) showed that if there is a sister group of P , which we call P , that has not admixed with P , P , 3 5 1 2 TheelementsofM areD(G ,G ,N,C),whereG andG or P , then it is possible to estimate f directly. In our case, 1 1,i 3,j 1,i 3,j 3 are the ith and jth individuals in G and G (i = 1,...,k ; P is the Denisovan genome. To estimate f, we define 1 3 1 5 j=1,...,k ).TheelementsofM areD(G ,G ,N,C).M SðP ;P ;P ;P Þ to be the numerator of either Equation 1 3 2 2,i 3,j 1 1 2 3 4 has k rows and k columns, and M has k rows and k or Equation 2. Then 3 1 2 3 2 columns.FromM andM theaverageD’sarecomputed,D 1 2 1 ^f ¼SSððPP1;;PP2;;PP5;;PP4ÞÞ: (3) caanndnDot2.beThuesepdrboebcleamuseisthtoeetleesmt ewnhtestwheitrhiDn1e=achDm2.aAtritx-taerset 1 3 5 4 not independent of each other and because the same refer- The intuition behind this estimator is that the denomina- ence population (G3) is used to compute both matrices. In- tor quantifies the excess coalescent events that occur stead, we combine M1 and M2 into a single matrix with k3 between lineages in P3 and P5 because they are sister rowsandk1+k2columns.Thenwerandomizethecolumns groups. Lineages in P2 that are introduced by admixture and compute D1 for the matrix containing the first k1 col- have the same coalescent history as all lineages from P3. umns andD2 forthe matrix containingthe last k2columns. Hence, the ratio is the fraction of lineages in P2 that trace ThenwecomparetheobservedD1–D2withthedistribution their ancestry to P because of admixture (Reich et al. of differences from the randomized matrices. We used 3 2010).Inourapplicationofthismethod,weareassuming a two-tailed test and 1 million replicates for each test. that there is no admixture from Denisovans (P ) into the Test 2 is similar to test 1, but because we compare only 5 other populations (P1,...,P4). Although Skoglund and G1andG2,asubsetofonepopulationisusedinplaceofthe Jacobsson (2011) have argued that there was admixture referencepopulation,G3.Forthepopulationwiththelarger from Denisovans into East Asians, our results described samplesize(sayG1),wecreatearandompartitionðGa1;Gb1Þ below did not find evidence of this admixture for the subject to the constraint that they differ in number by no HanChineseandJapanesesamplesweanalyzed.Foranal- more than one. For M1, we compute D for all pairs of indi- ysis A, we explored the variation in estimated D-statistics viduals in Ga1 and G2. The elements of M2 are and admixture rates (f) for all pairs of individuals of dif- DðGa1;i;Gb1;j;N;CÞ, where Ga1;i and Gb1;j are the ith and jth ferent human populations. For analysis B, since we did individuals in the two subpopulations created by the parti- not include the Denisova genome, we estimated only D- tion. Test 1 is then applied to M1 and M2. statistics. We also calculated the f-statistics for each pair of indi- viduals. Using the same randomization tests as described above,wedeterminedwhetherthereweresignificantdiffer- Randomizationtests ences between populations in estimates of the admixture We computed D for each pair of individuals, both within rate. Significant differences observed using the admixture populations and between populations. We developed two rate suggest that the effect is truly due to the Neanderthal randomization tests of statistical significance. Both are sim- and not admixture with Denisovans. ilar to the Mantel test. Test 1 tests whether the average D Identifyingputativearchaic humanregions computed for one pair of populations is significantly larger thanforanotherpair,andtest2testswhethertheaverageD Previousworkhasshownthatarchaicadmixtureoftenleads for a pair of populations differs significantly from 0. tolong,divergent haplotypes at lowfrequency (Wall2000; For test 1, we start with sequences from three human PlagnolandWall2006).WedefinetwoSNPstobe“congru- populations, G , G , and G , each containing k , k , and k ent” if their diploid allele counts (i.e., zero, one, or two 1 2 3 1 2 3 diploid sequences. We compute two matrices of D values. counts of a particular allele) across individuals are NeanderthalAncestryinEurasians 203 completelycorrelated(i.e.,r2=1).Wedefinethemaximum IdentifyingputativeNeanderthalregions numberofpairwisecongruentSNPstobel anddenotethe d Toidentifywhichofthe2254regionsdescribedabovewere collection of rarer (minor allele frequency # 0.5) alleles at likely to reflect recent Neanderthal admixture, we imposed each of these pairwise congruent sites to be the putative the following additional criteria on the putative archaic archaic haplotype. From the filtered Complete Genomics human haplotypes: data, we then identified all regions from 8 to 100 kb in length where l $ 30 and l /S $ 0.1, where S is the total 1. The Neanderthal allele must be called at $12 SNPs and d d number ofpolymorphic sitesin the region. When identified match the putative archaic haplotype at $70% of these regions overlapped, we took the region with the largest SNPs. value of l /S. We also required that neighboring regions 2. The Neanderthal allele and the chimp allele must be d withputativearchaichaplotypescongruentwitheachother called at $8 SNPs and the Neanderthal allele must be be separated by at least 200 kb, to avoid double counting derived (relative to chimp) at $60% of these sites. longarchaichaplotypes.Atotalof2254regionswereiden- 3. Theputativearchaichaplotypemustbeatlowfrequency tified. Of these, 411 were private to the non-African (,5%) in the sub-Saharan African samples. samples. The motivation for criterion 1 is obvious, and we note that To estimate what proportion of these regions might be amorestringentcutoffwasnotusedduetothepoorquality false positives, we simulated whole-chromosome sequence of the Neanderthal genome sequence. Criterion 2 was data (Chen et al. 2009) under a model that incorporated implemented to cut down on regions that reflect shared both recent (intracontinental) and ancient (intercontinen- ancestral polymorphism between modern humans and tal)populationstructure(Figure2).Specifically,weassume Neanderthals; it is based on an observation of Noonan a panmictic ancestral population split into two daughter etal.(2006)thatrecentNeanderthaladmixturewillleadto populationsattimeT =0.6(usingthestandardcoalescent 0 an increase in SNPs where Neanderthals have the derived scaling of 4N generations), with (symmetric) scaled migra- allele.Finally,criterion3reflectsourpriorbeliefthatadmix- tion rate of M = 5. At time T = 0.05 – 0.053, one of the 0 1 ture with Neanderthals did not occur in Africa andthat the ancestralpopulations(i.e.,the“non-African”one)experien- presence of Neanderthal alleles in Africa could reflect only cesapopulationbottleneckresultingina100-foldreduction morerecentmigrationpatterns.Atotalof226regionswere in population size. Then, at time T = 0.045, each popula- 2 identified that meet these additional criteria. We note in tion splits into two descendant populations, connected by passing that the specific cutoffs used in criteria 1–3 are migrationrateM =8.Whilearbitrary,thismodelattempts 1 somewhat arbitrary, but our qualitative conclusions are to incorporate the major features of human demographic unchanged under a range of similar criteria (results not history, including intra- and intercontinental population shown). structureandabottleneckinthehistoryofnon-Africanpop- Weimplementedasimple permutation testtoassess the ulations, and is similar to the model used by Yang et al. statisticalsignificanceoftheobserveddifferenceinfrequen- (2012).Theresultsdescribedbelowarequalitativelysimilar cies of Neanderthal regions in East and South Asians and if other plausible values for the times and migration rates Europeans. Specifically, we kept the presence/absence of are used (results not shown). Using N = 10,000 and an Neanderthal regions for each individual constant and averagegenerationtimeof25years,eachunitofscaledtime randomly permuted the geographic label (i.e., “European” corresponds to 1 million years. vs.“EastAsian”)ofthesample100,000times.Similaranal- We simulated 30 different 100-Mb chromosomes, using yses were used to compare the frequency of Neanderthal the model described above with mutation parameter u = regions in Maasai vs. other sub-Saharan African samples. 3.5 · 1024/bp, recombination parameter r = 4 · 1024/bp, and 10 individuals sampled from each of the four extant IdentifyingputativeDenisovanregions populations. The simulated number of segregating sites was substantially higher than the actual number in our filtered Excludingthe226Neanderthalregionsidentifiedabove,we screened the remaining 2028 putative archaic regions for data. Since average l values are positively correlated with d Denisovan admixture, using the same criteria as for Nean- levelsofdiversity,thesimulatedl valuesarehigheronaver- agethanexpectedinrealdata,anddourchoiceofuisconser- derthals. Thirty total regions fit these criteria. vative.Also,standardestimatesofraregenerallyhigherthan EstimatinglocalancestryintheMaasai the value wetook (Myers etal. 2005), which is also conser- vativeforourpurposes.Wethentabulatedthetotalnumber We took the filtered Complete Genomics data described at ofregionswithl $ 30,l /S$ 0.1,andwithdivergenthap- thestartofthissectionandestimatedSNPallelefrequencies d d lotypeSNPsprivatetothesimulatednon-Africansamples.We separately in the 13 European samples and the 13 non- identified a total of 3 regions that satisfied these criteria, MaasaiAfrican samples.These wereusedasproxiesforthe compared with 411 regions that were identified from the (unknown) non-African and African ancestral populations. actualdata.Thisleadstoanestimateofafalsediscoveryrate We then included only those SNPs with allele frequencies ofq, 0.01. thatdifferbyatleast0.3inouranalyses.Wecalculatedthe 204 J.D.Walletal. Figure3 Summaryofsignificancetestsforav- eragevaluesofD.Positivevaluesindicatethat thesecondsequenceismoresimilartotheNe- anderthalgenomethanthefirstsequence.Inall parts, the box plots indicate the range of D values obtained for pairs of individuals from the populations indicated. A and B are box plots of individual D-statistics computed for each individual from the specified population compared with each Yoruban. P-values are from therandomizationtest, test 1, of signifi- cant differences in the average D values for different pairs of populations. C and D show box plots of individual D-statistics computed foreverypairofindividualsinthespecifiedpop- ulations. P-values are from the randomization test,test2,ofsignificantdifferencesoftheav- erageDfrom0.SeealsoTableS2. likelihoodofeachancestralconfiguration(i.e.,zero,one,or remainstobeestablished.Also,wefindevidenceforasmall two alleles inherited from the non-African population) sep- but significant amount of Neanderthal admixture into the arately for each SNP. Then, over sliding windows of 1 Mb, Maasai genomes (P (cid:1) 0.03, Table S4). When compared to weformedacompositelikelihoodbymultiplyingtogetherall the Yoruba, the Maasai have a higher average D than the of the single-SNP likelihoods contained in the window and Luhya (Figure 3B, Table S4). When the Maasai are com- tabulated which ancestral configuration had the highest paredtoallotherAfricansamples,theaverageDispositive (composite)likelihood.ForeachSNP,wethenusedmajority (Figure 3D). In addition, when East Asians and Europeans ruletomakeancestrycalls,usingallwindowscontainingthe are compared to the Maasai, the average D’s are somewhat SNP in question. SeeWalletal. (2011) forfurther details. lowerthanwhentheyarecomparedtoeithertheYorubaor theLuhya.TheP-valuesshowninFigure3,AandBarefrom test 1 and those in Figure 3, C and D are from test 2. Results TableS1,TableS2,andTableS3showestimatedvalues D-statistics andestimatesoff off.Theestimatesoftheadmixturerateshowthatwhenwe incorporate the Denisovan genome into our analysis, the TheD-statisticsandestimatesoffwecomputedaresumma- admixture rate between East Asians and Neanderthals rizedinFigure3andSupportingInformation,FileS1,Table remains significantly higher than the admixture rate be- S1,TableS2,TableS3,TableS4,TableS5,TableS6,Table tween Europeans and Neanderthals (P (cid:1) 0.001, Table S7). S7, Table S8, Table S9, Figure S1, Figure S2, Figure S3, The Maasai remain significantly more genetically similar to Figure S4, Figure S5, Figure S6, Figure S7, and Figure S8. the Neanderthals when compared to the Luhya (P (cid:1) 0.03, Several features of the results are notable. First, we find TableS7),buttheobservedsignificantdifferencefortheD- evidence for more Neanderthal admixture into the East statistic when comparing the Maasai and the Yoruba is not Asian samples than into the European samples (P = observedforthef-statistic(P(cid:1)0.34,TableS7),whichprob- 0.001)—consistently higher D values result when East ablyreflectsthelowerpowerofusingfasateststatistic.The AsiansarecomparedtooneoftheAfricanpopulationsthan admixtureratesfortheSouthAsiansgivethesameresultsas when Europeans are compared (Figure 3A, Table S4), and those for the D-statistic (Table S9). theaverageDispositivewhenEastAsiansarecomparedto Europeans(Figure3C,TableS5).InanalysisB,comparisons Identifying“Neanderthal haplotypes” withtheSouthAsiansamplesareintermediatewithrespect to the European andEast Asian samples but not in analysis Our new method for identifying introgressed Neanderthal A, indicating that the South Asian sample differs from the fragments in human populations detected 226 different East Asian ones but the degree of similarity to Europeans putative Neanderthal regions. The relative frequencies of NeanderthalAncestryinEurasians 205 Figure4 DistributionofthenumberofputativeNeander- thal regions for each Eurasian individual. European genomes are colored in green, East Asian genomes are colored in red,and South Asiangenomesare colored in black. these putative Neanderthal haplotypes in the 42 sampled from Denisovans. We identified a total of 30 regions, all at modern human individuals then provide estimates of the low frequency, with no significant difference in frequency relativecontributionsofNeanderthalDNAtothegenepools between populations. of contemporary human populations. We found that on Maasaiadmixture average the “Neanderthal haplotypes” were at higher fre- quency in the East Asians than in the Europeans (9.6% vs. PreviousgeneticstudieshavesuggestedthattheMaasaimay 6.4%;P=3.0·1024,permutationtest),consistentwiththe be an admixed population with a substantial proportion of D-statistic results presented in Figure 3 (Figure 4). We also non-African ancestry (Henn et al. 2011). If the non-African found evidence for a small, but statistically significant, Ne- ancestry were dueto recent(i.e., post-Neanderthal) admix- anderthal contribution to the genomes of the Maasai (P = ture, then the observation of Neanderthal ancestry in the 4.9 · 1024), but did not find a significant difference in Maasaiwouldnotbeunexpected.Alternatively,spatiallyex- Neanderthal haplotype frequency between the East Asian plicit models of ancient population structure might explain and South Asian samples (P . 0.05). the greater similarity between Maasai and Neanderthals relative to other sub-Saharan African groups (A. Manica, Additional testofancientpopulationstructure personalcommunication).Onedifferencebetweentheseal- As reviewed in the Introduction, there is already evidence ternative explanations is what they predict about the pat- againstthehypothesisthattheextrasimilarityofnon-African ternsofsimilarityacrossthegenomesofMaasaiindividuals. populations to Neanderthals is accounted for by ancient Under a model of recent admixture, we expect Maasai populationsubdivision.Toexplorethispointfurther,wetook genomes to show large, distinct blocks of sequence with the 411 regions from our whole-genome analyses that were different genetic patterns, corresponding to blocks with identifiedpurelyonthebasisoftheirLDpatterns(i.e., with- non-African vs. African ancestry. The average size of the out usinganyinformationfrom theNeanderthalgenomese- non-African blocks (in morgans) is roughly the inverse of quence).Then,foreachnon-Africanindividual,wecalculated thetime(ingenerations)sinceadmixture.Incontrast,under the D-statistic for those regions where the individual con- a model of ancient admixture the similarity of Maasai tained a rare, diverged haplotype. If this haplotype were re- genomestotheNeanderthalgenomewillbespreadthrough- cently inherited from Neanderthals, we would expect the D outthegenomebecausetheadmixturehappenedmuch lon- valuestobestronglypositive.Ifinsteadtherewerenorecent ger ago. admixturebetweenmodernhumansandNeanderthals, then To distinguish between these two possibilities, we there is no a priori reason why these regions would show D employed acomposite-likelihood–based approachtoidenti- values significantly different from 0. Recombination acting fyingAfricanandnon-Africanregionsofancestryacrossthe over the past 300 Kyr would break up local patterns due to genomes of the four Maasai samples (Wall et al. 2011). shared ancestral polymorphisms to scales ,0.01 cM (i.e., Briefly, we used the European (CEU and TSI) and other ,10 kb on average). The D values that we observe are African(YRIandLWK)samples(Table1)toestimateallele strongly positive (averageD = 0.594,compared withanav- frequenciesinnon-AfricanandAfricanancestralpopulations erageD=0.068forthewholegenome),providingadditional and then estimated the number of alleles inherited from evidencethatmostoftheunusualhaplotypesfromthese411 eachancestralpopulationateachSNPinthegenome.These regionsareindeedtheresultofrecentintrogressionfromthe extant samples may not be perfect proxies for the true an- Neanderthalgenepool(P ,, 1028, Figure5). cestralpopulations,butthequalitativeresultspresentedbe- low are likely to be valid. Identifying “Denisovanhaplotypes” In summary, we estimate an average of (cid:1)30% non- Excludingthe226Neanderthalregionsdescribedabove,we African ancestry in each Maasai genome, and the sizes used the same criteria to identify regions likely inherited oftheancestralblocksareconsistentwithadmixturethat 206 J.D.Walletal. Figure5 BoxplotshowingtheaverageDacrossthewholegenomesof thenon-AfricanindividualscomparedwiththeaverageD(forthesame individuals) across regions identified as having unusual patterns of LD (i.e.,putativearchaicregions). happened (cid:1)100generations ago (Figure6A). Wethen par- titionedeachMaasaigenomeintoregionswithzero,one,or Figure6 RecentandancientadmixtureintheMaasai.(A)Representative two inferred African alleles and calculated D separately for plotofthenumberofestimated“African”allelesacrossthefirst30Mbof each partition. We found that the D values are significantly chromosome1inoneoftheMaasaigenomes.(B)EstimatedvaluesofD forportionsofthegenomeestimatedtocontainzero,one,ortwo“non- more negative with increasing numbers of inferred non- African”alleles. Africanalleles(P=2.0·1024;Figure6B).Thisobservation provides strong support for recent non-African gene flow Wewerenotabletoconfirmthe conclusionofSkoglund into the Maasai, with the non-African alleles bringing with and Jakobsson (2011) that there was Denisovan admixture them low levels of Neanderthal ancestry. into East Asians. We did not detect any difference in the number of apparent Denisovan segments in Europeans and East Asians. The East Asian genomes analyzed, however, Discussion werefromnorthernEastAsia(BeijingandTokyo),notfrom Ourresultsconfirmandreinforceseveralconclusionsabout southern East Asia where Skoglund and Jakobsson found admixture between Neanderthals and the ancestors of the strongest signal of admixture with Denisovans. modern humans. Using a much larger number of high- OurresultsandthoseofMeyeretal.(2012)implythat coverage genome sequences than were previously analyzed the relatively simple admixture scenario proposed by for this purpose and using two complementary methods of Green et al. (2010) needs to be altered. At least two sep- analysis(D-statisticsanddetectionofintrogressedNeander- arate episodes of admixture between Neanderthals and thal segments), we confirm the conclusion of Meyer et al. modern humans must have occurred, and at least one of (2012) that East Asians (Han Chinese and Japanese) are thoseepisodesmusthaveoccurredaftertheseparationof more similar to the published Neanderthal sequence than the ancestors of modern Europeans and East Asians. are Europeans. Because we have analyzed more modern Rather than have two distinct episodes of admixture, it humansequencesthanMeyeretal.(2012)did,weareable seems more plausible that admixture took place over toshowtheextentofvariationwithinbothAsianandAfrican a protracted period 50–80 KYA. During that period the populations. We also confirm the conclusions of Yang et al. ancestors of Europeans diverged and subsequently expe- (2012)andSankararamanetal.(2012)thatthesimilarityof rienced less admixture than the ancestors of East Asians. bothEuropeansandEastAsianstoNeanderthalsistheresult This scenario is consistent with the simulation models of of recent admixture and not ancient population subdivision. Currat and Excoffier (2011)and Skoglundand Jakobsson Finally, we used the high-coverage Denisova sequence of (2011). Meyer et al. (2012) to determine that the admixture rate If this scenario is correct, the time of separation of the (f) into East Asians is(cid:1)40% higherthaninto Europeans. ancestorsofmodernEuropeanandEastAsianpopulationsis NeanderthalAncestryinEurasians 207 constrained. Since there is no archaeological record of Literature Cited Neanderthals in the past (cid:1)30 Kyr, it follows that the sepa- Chen,G.K.,P.Marjoram,andJ.D.Wall,2009 Fastandflexible ration of Europeans from East Asians had to have occurred simulationofDNAsequencedata.GenomeRes.19:136–142. before Neanderthals went extinct. Consequently, estimates Currat,M.,andL.Excoffier,2011 Strongreproductiveisolationbe- of East Asian–European population divergence of ,30 KYA tweenhumansandNeanderthalsinferredfromobservedpatterns (Gutenkunst et al. 2009; Gravel et al. 2011) are unlikely to ofintrogression.Proc.Natl.Acad.Sci.USA108:15129–15134. becorrect. Thistimeframeisalso supportedbya40-to50- Drmanac,R.,A.B.Sparks,M.J.Callow,A.L.Halpern,N.L.Burns etal.,2010 Humangenomesequencingusingunchainedbase KYAmodernhumanfossilrecentlyfoundinChina(Fuetal. readsonself-assemblingDNAnanoarrays.Science327:78–81. 2013). Duarte, C., J. Mauricio, P. B. Pettitt, P. Souto, E. Trinkaus et al., Ourtwoanalysesyieldedslightlydifferentresultsforthe 1999 The early Upper Paleolithic human skeleton from the Gujarati (South Asian) samples. However, it would not be Abrigo do Lagar Velho (Portugal) and modern human emer- surprisingifthetruelevelofNeanderthalancestryinSouth genceinIberia.Proc.Natl.Acad.Sci.USA96:7604–7609. Durand, E. Y., N. Patterson, D. Reich, and M. Slatkin, AsianswasintermediatebetweenEuropeansandEastAsians 2011 Testing for ancient admixture between closely related because previous studies have shown gradients in genetic populations.Mol.Biol.Evol.28:2239–2252. ancestryacross Eurasia(Rosenbergetal. 2002). Eriksson, A., and A. Manica, 2012 Effect of ancient population Our finding of Neanderthal admixture into the Maasai structureonthedegreeofpolymorphismsharedbetweenmod- was initially surprising, given the lack of evidence that ernhumanpopulationsandancienthominins.Proc.Natl.Acad. Sci.USA109:13956–13960. NeanderthalsevercrossedintoAfricaorthattheancestors Finlayson,C.,2004 NeanderthalsandModernHumans:AnEcolog- oftheMaasaiwereeverintheMiddleEast.Althoughdirect ical and Evolutionary Perspective. Cambridge University Press, contactbetweenthetwogroupsinthepastistheoretically Cambridge,UK. possible, our results are more consistent with a scenario Fu, Q., M. Meyer, X. Gao, U. Stenzel, H. A. Burbano et al., involving recent admixture between the ancestors of the 2013 DNAanalysisofanearlymodernhumanfromTianyuan Cave,China.Proc.Natl.Acad.Sci.USA110:2223–2227. Maasai and one or more (historically) non-African groups Garrigan,D.,Z.Mobasher,S.B.Kingan,J.A.Wilder,andM.F.Ham- with Neanderthal ancestry several thousand years ago. mer, 2005a Deep haplotype divergence and long-range linkage This interpretation is broadly consistent with recent find- disequilibrium at xp21.1 provide evidence that humans descend ings of African admixture into Middle Eastern and South- fromastructuredancestralpopulation.Genetics170:1849–1856. ern European populations during the same timescale Garrigan, D., Z. Mobasher, T. Severson, J. A. Wilder, and M. F. Hammer, 2005b Evidence for archaic Asian ancestry on the (Moorjani et al. 2011) and a greater genetic similarity be- humanXchromosome.Mol.Biol.Evol.22:189–192. tween East Africanand non-African samples than between Gravel,S.,B.M.Henn,R.N.Gutenkunst,A.R.Indap,G.T.Marth West African and non-African samples (Tishkoff et al. et al., 2011 Demographic history and rare allele sharing 2009). Together these studies provide additional support among human populations. Proc. Natl. Acad. Sci. USA 108: for the hypothesis that admixture between genetically di- 11983–11988. Green, R. E.,J.Krause, A.W.Briggs, T. Maricic, U. Stenzelet al., vergedgroupsisacommonfeatureofhumandemographic 2010 A draft sequence of the Neandertal genome. Science history. 328:710–722. ThenewpictureofhumanandNeanderthalancestrythat Gutenkunst, R. N., R. D. Hernandez, S. H. Williamson, and C. D. emerges from our results is almost certainly not complete, Bustamante, 2009 Inferring the joint demographic history of and our results suggest that intracontinental variation in multiple populations from multidimensional SNP frequency data.PLoSGenet.5:e1000695. levels of Neanderthal ancestry may be common. With the Hammer,M.F.,A.E.Woerner,F.L.Mendez,J.C.Watkins,andJ. current rate of progress in whole-genome sequencing and D. Wall, 2011 Genetic evidence for archaic admixture in the possibility of additional draft genomes from specimens Africa.Proc.Natl.Acad.Sci.USA108:15123–15128. of archaic individuals, we will soon learn more about the Henn,B.M.,C.R.Gignoux,M.Jobin,J.M.Granka,J.M.Macpher- admixtureprocess.Inparticular,theconstructionof“archaic son et al., 2011 Hunter-gatherer genomic diversity suggests admixture maps” detailing the distribution of archaic DNA asouthernAfricanoriginformodernhumans.Proc.Natl.Acad. Sci.USA108:5154–5162. segments in different modern human populations will help Hublin, J. J., 2009 Out of Africa: modern human origins special us to infer the timing, locations, and exact numbers of in- feature: the origin of Neandertals. Proc. Natl. Acad. Sci. USA trogression events and the role that archaic admixture may 106:16022–16027. have played in the evolution of the AMH genome. Krings, M., A. Stone, R. W. Schmitz, H. Krainitzki, M. Stoneking etal.,1997 NeandertalDNAsequencesandtheoriginofmod- ernhumans.Cell90:19–30. Acknowledgments Lachance,J.,B.Vernot,C.C.Elbers,B.Ferwerda,A.Fromentetal., 2012 Evolutionaryhistoryandadaptationfromhigh-coverage This work was supported in part by National Institutes of whole-genome sequences of diverse African hunter-gatherers. Health grants R01-GM40282 (to M.S.), R01-HG005226 (to Cell150:457–469. Lahr,M.M.,1994 Themultiregionalmodelofmodernhumanorigins J.D.W. and M.F.H.), and T32 HG 00047 (training grant to -areassessmentofitsmorphologicalbasis.J.Hum.Evol.26:23–56. M.A.Y.), as well as by National Science Foundation Gradu- Li, H., B. Handsaker, A. Wysoker, T. Fennell, J. Ruan et al., ate Research Fellowship Program Division of Graduate 2009 TheSequenceAlignment/MapformatandSAMtools.Bi- Education grant 1106400 (to M.A.Y.). oinformatics25:2078–2079. 208 J.D.Walletal.
Description: