Copyright(cid:2)2008bytheGeneticsSocietyofAmerica DOI:10.1534/genetics.107.080432 Testing for Archaic Hominin Admixture on the X Chromosome: Model Likelihoods for the Modern Human RRM2P4 Region From Summaries of Genealogical Topology Under the Structured Coalescent Murray P. Cox,* Fernando L. Mendez,† Tatiana M. Karafet,* Maya Metni Pilkington,‡ Sarah B. Kingan,* Giovanni Destro-Bisol,§ Beverly I. Strassmann** and Michael F. Hammer†,‡,**,1 *ARLDivision ofBiotechnology,†Departmentof Ecologyand EvolutionaryBiologyand ‡DepartmentofAnthropology,UniversityofArizona, Tucson, Arizona85721, §Department ofAnimal andHuman Biology,University ofRome ‘‘LaSapienza,’’ and Istituto Italianodi Antropologia,00185Rome, Italyand**Department ofAnthropology, UniversityofMichigan,AnnArbor,Michigan 48109 Manuscript receivedAugust 13, 2007 Acceptedforpublication November 11,2007 ABSTRACT A2.4-kbstretchwithintheRRM2P4regionoftheXchromosome,previouslysequencedinasampleof 41globallydistributedhumans,displayedbothanancienttimetothemostrecentcommonancestor(e.g., aTMRCAof(cid:2)2millionyears)andabasalcladecomposedentirelyofAsiansequences.Thispatternwas interpreted to reflect a history of introgressive hybridization from archaic hominins (most likely Asian Homo erectus) into the anatomically modern human genome. Here, we address this hypothesis by resequencingthe2.4-kbRRM2P4regionin131Africanand122non-Africanindividualsandbyextending thelengthofsequenceinawindowof16.5kbencompassingtheRRM2P4pseudogeneinasubsetof90 individuals.WefindthatboththeancientTMRCAandtheskewinnon-Africanrepresentationinoneof the basal clades are essentially limited to the central 2.4-kb region. We define a new summary statistic called the minimum clade proportion (p ), which quantifies the proportion of individuals from a spe- mc cifiedgeographicregionineachofthetwobasalcladesofabinarygenetree,andthenemploycoalescent simulationstoassessthelikelihoodoftheobservedcentralRRM2P4genealogyundertwoalternativeviews of human evolutionary history: recent African replacement (RAR) and archaic admixture (AA). A molecular-clock-basedTMRCAestimateof2.33millionyearsisastatisticaloutlierundertheRARmodel; however,thelargevarianceassociatedwiththisestimatemakesitdifficulttodistinguishthepredictionsof thehumanoriginsmodelstestedhere.Thep summarystatistic,whichhasimprovedpowerwithlarger mc samples of chromosomes, yields values that are significantly unlikely under the RAR model and fit expectations betterundera range of archaicadmixture scenarios. FOSSIL, archaeological, and genetic data all lend 2005a,b;PlagnolandWall2006;WallandHammer support to the hypothesis that Homo sapiens orig- 2006). Early studies of nonrecombining regions such inated in Africa (McBrearty and Brooks 2000; asmtDNAandtheYchromosomewereconsistentwith McDougall et al. 2005; Garrigan and Hammer the hypothesis of a single origin followed by complete 2006). With the acceptance of the role of Africa in replacement,sometimesreferredtoastherecentAfrican our species’ origin, there is now increasing interest in replacement (RAR) model. While many of the more re- thequestionofhowtheancestralpopulationthatgave centlypublishedDNAsequencingstudiesofX-linkedand risetoanatomicallymodernhumans(AMH)wasstruc- autosomallociarealsoconcordantwiththisRARmodel, tured.DidAMHemergefromasingle,isolatedAfrican agrowingnumberarenot(Evansetal.2006;Garrigan deme or from a subdivided ancestral population with and Hammer 2006). geneflowamongsubpopulations?Arelatedquestionis Garrigan et al. (2005b) published one of the first whether the expanding AMH population completely studiestopositrecentadmixturebetweenAMHandan replaced or interbred with then contemporaneous ar- archaichumanpopulation.Aresequencingstudyof2.4 chaic populations such as Neanderthals and H. erectus kb of the ribonucleotide reductase M2 pseudogene 4 (Eswaran 2002; Templeton 2002; Garrigan et al. (RRM2P4) in a sample of 41 globally diverse humans identified an unusual pattern of nucleotide polymor- phismcomparedwithmostofthehumangenome.The 1Corresponding author: ARL Division of Biotechnology, Life Sciences reconstructed gene tree revealed two clades of allelic South,UniversityofArizona,Tucson,AZ85721. E-mail:[email protected] sequences that were estimated to have diverged (cid:2)2 Genetics178:427–437(January2008) 428 M. P.Coxetal. Figure 1.—Location of the RRM2P4 locus on the X chromosome (top bar), positions of the threesequencedregions(middlebar),andplace- mentoftheRRM2P4pseudogenefragmentsrel- ative to the RRM2P4 central region (bottom bars).Verticalbarsinthecentralregionindicate SNPs defining the two basal clades. All coordi- nates correspond to the human genome UCSC March 2006 build. million years ago (MYA). One clade with very little se- highratesofrecombination($1.0cM/Mb).Theseloci, quencevariationwasspecifictoAsians,whiletheother which represent selectively neutral X-chromosomal di- more diverse clade of RRM2P4 sequences resembled a versity, will be published elsewhere. However, using pattern typical of human variation and was globally thesedata,weshowherethatsimpleRARmodelsoften distributed. By genotyping a diagnostic SNP in a large produceTMRCAvaluesthataresimilartothatofRRM2P4, sample of humans, the divergent ‘‘Asian clade’’ was whilegenealogieswithskewingofbasalclademember- shown to be frequent in Southeast Asians and nearly shiptowardnon-Africansremainstatisticaloutliers. absent in sub-Saharan Africans. The greater genealog- ical depth in Asia led to the hypothesis that RRM2P4 is a genomic remnant of introgressive hybridization SUBJECTS AND METHODS fromanAsianarchaicpopulation(H.erectus)intoAMH groups expanding from Africa. However, despite the Regionssequenced:ResequencedatafortheRRM2P4 increasingavailabilityofgenomicdata,theuniqueness locus were generated within a ‘‘trio’’ design (Figure 1), ofsuchdeeptimestothemostrecentcommonancestor which is an economical approach to jointly ascertain (TMRCA) and of gene trees with non-African basal detailedpolymorphismpatternsandlarger-scalelinkage- clades remains unclear. Importantly, we still do not disequilibrium profiles (Garrigan et al. 2005b). We se- understand the likelihood of such genealogies under quencedthreegenomicsegmentsof1725bp½University either the RAR model or models involving admixture of California, Santa Cruz (UCSC) March 2006 genome witharchaichumans. coordinates143,212,138–143,213,863(cid:3),2341bp(coordi- Garriganetal.’soriginalRRM2P4studywaslimitedby nates 143,219,154–143,221,495), and 1601 bp (coordi- a paucity of sequence data and primarily qualitative nates 143,227,057–143,228,658), which were separated analyses. Here, we extend earlier research by sequenc- by unsequenced regions of 5291 and 5562 bp, respec- ing the 2.4-kb RRM2P4 region in 131 African and 122 tively.Ourcentralsubregionisthesameasthatdescribed non-African individuals. We also determine the extent byGarriganetal.(2005b). of sequence that exhibits the unusual pattern of poly- Sampling: The RRM2P4 central region was se- morphism by selectively resequencing three DNA frag- quencedin131Africanand122non-Africanindividuals mentstotaling5.6kbwithina16.5-kbwindowflanking (panel A). These samples were chosen without prior RRM2P4. We infer the likelihood of the observed information about RRM2P4 lineage status. DNA sam- RRM2P4 genealogy using a suite of summary statistics ples representing Mandenka from Senegal (n ¼ 16), andMonteCarlocoalescentsimulationsundertheRAR BiakaPygmies fromtheCentralAfricanRepublic(n¼ model and a range of archaic admixture (AA) models 16), Khoisan from Namibia (n ¼ 9), French Basque (Nordborg2000;PlagnolandWall2006).TheRAR (n¼16),HanChinese(n¼16),andNasioifromBougain- models are parameterized by approximate Bayesian ville(n¼16)werepurchasedfromtheCentred’Etude computation (ABC) conditioned on resequence data duPolymorphismeHumain(Cannetal.2002).Samples from an additional data set of 19 neutral, unlinked X- of Baka Pygmies from Cameroon (n ¼ 23) were pro- chromosomalloci.Thisresequencedatasetrepresents vided by Giovanni Destro-Bisol and the Dogon from 12Mbfrom19regionsoftheXchromosome,whichare Mali (n ¼ 32) were provided by Beverly Strassmann. unlinked from genes (and each other) by medium to Samples from the Dinka of southern Sudan (n ¼ 21) ArchaicAdmixture ontheX Chromosome 429 were collected in Tucson, Arizona, with informed con- sent. Samples from three Siberian populations, the Selkups (n ¼ 32), Forest Nentsi (n ¼ 28), and Tundra Nentsi(n¼3),weredescribedpreviously(Karafetetal. 2002). Non-population-based samples (n ¼ 25) from the Y chromosome consortium cell lines (Y Chromo- some Consortium 2002) were also included in this panel.Resequencedataforthefulltrioweregenerated inasecondpanelof42Africanand48non-Africanin- dividuals(panelB)fromthreeAfricangroups(Khoisan, Mandenka, and Biaka) and three non-African groups (French Basque, Han, and Nasioi) (samples as de- scribed above). All sampling protocols were approved bytheHumanSubjectsCommitteeattheUniversityof Arizonaandbytheinstitutionsofallcollaboratorswho providedDNAsamples. Recombinationanalysisandgenetreedating:Ratesof linkage disequilibrium across the sequence were de- termined using LDhat (McVean and Spencer 2006). Figure 2.—Schematics of two demographic models. The TheRRM2P4centralregionshowsonlylimitedsignsof recent African replacement (RAR) model grows exponen- recombination, and a most parsimonious tree was tially from a 104 effective population size at 80 KYA, and reconstructed by breaking low-frequency reticulations. 500 individuals found the non-African population at 60 The TMRCA of the tree and the age of its polymor- KYA, experience a constant-sized bottleneck for 5 KYA, and phismswereestimatedwithGenetree(Griffiths2007). grow exponentially from 55 KYA. The ancient admixture (AA) model includes introgression from an ancestral homi- Genetree employs a full maximum-likelihood method nin source with constant population size of 103; the popula- that is based on the standard coalescent (Kingman tion leading to that of anatomically modern humans and 1982) and assumes an infinite-sites mutational model. ancestral hominins diverged 2 MYA. Modern effective sizes Likelihoodsurfacesforthepopulationmutationrate,u, andtheintercontinentalmigrationratewereinferredbyap- andthepopulationgrowthrate,b,weregeneratedun- proximate Bayesiancomputation. derapanmicticsingle-dememodelandanislandmodel of population structure. TMRCA values were inferred frommaximum-likelihoodparameterizationsunderboth the intergeneration interval (e.g., a 25-year mean in- models. tergeneration interval) does not alter our conclusions Demographicmodels:SummariesoftheRRM2P4cen- (ourunpublisheddata). tral region were compared with values obtained from TheRARmodeldepictstwopanmictic,exponentially simulationsunderastructuredcoalescent (Nordborg growingWright–FisherdemesrepresentingAfricanand 1997;Hudson2002)todeterminethelikelihoodofthe non-Africanpopulations.GrowthbeginsintheAfrican observed genealogy. We employed a framework for deme 80,000 years ago from a single ancestral popula- humandemographysimilartothatdevelopedbyPlagnol tion(N ¼104)andcontinuesuntilitreachesitscurrent e and Wall (2006), but modified (see below) to yield effectivesize.Asmallgroup(N ¼500)splitsfromthe e the number of segregating sites, S, consistent with an Africandeme60,000yearsagotoformthenon-African independent data set of 19 X chromosome loci se- deme.Thissubgroupexperiencesabottleneckfor5000 quencedinthesameindividuals(panelB).Wepresent years before expanding exponentially to its current resultsfromtwodemographicmodels:atwo-demeRAR effective size. Modern effective sizes and the intercon- model and an AA scenario similar to Nordborg’s tinental migration rate were parameterized by approx- (2000)isolationandadmixturemodel(Figure2).Alter- imateBayesiancomputation(seebelow). nativeRARmodelswithvaryinglevelsofrecentpopula- TheAAmodelincorporatesinstantaneousadmixture tion subdivision (e.g., one-deme and six-deme models) fromanancestralhomininsourceintotheRARmodel did not differ significantly from the two-deme RAR describedabove.Anancestralhomininpopulationwith model reported here (our unpublished data). These constanteffectivepopulationsizeof103splitsfromthe models are not meant to represent the true history of ancestorsofmodernhumansat2MYA.Weconsidereda human populations, but they do let us explore the ef- range of admixture rates (0–5%) from ancestral hom- fects of expansion and replacement vs. archaic admix- ininstomodernhumansinAsia,coupledwithaseriesof tureonpatternsofgenomicvariation.Coalescentdates admixture times (10–55 KY before present). We could (scaledby3N generations)weretranslatedtochrono- notinferanoptimizedAAmodelbecausethepaucityof e logical time using a 28-year mean intergeneration candidatelociisnotsufficientbothformodeltraining interval(Fenner2005).Theuseofalowerestimateof andforsubsequentstatisticaltesting. 430 M. P.Coxetal. Model fitting via approximate Bayesian computa- tion:WeparameterizedtheRARmodeltoreflectwhatis known about deep human demography and, conse- quently,toproducesimulateddatasetsthatmimicreal genomicdatasets.Parameterinferencerapidlybecomes computationally intractable at high-dimensional state spaces, such as those associated with complex demo- graphicmodels.Therefore,wefixedsomedemographic parameters that have been inferred elsewhere, e.g., thetimeofonsetofpopulationgrowthandnon-African bottleneck size (Plagnol and Wall 2006, Figure 2). However, a lower-dimensional state space of modern populationsize,N ,andtheintercontinentalmigration 0 ratepergeneration,m,wereinferredexplicitlybyABC (Beaumontetal.2002). Essentially,wegenerated104coalescentsimulationsfor each of 105 different sets of demographic parameters, Q¼{N ,m},thatweredrawnrandomlyfromtwouniform 0 distributions,N 2U½104; 105(cid:3) andm 2U½10(cid:4)11; 10(cid:4)8(cid:3): 0 Each set of coalescent simulations was compared with resequence data from 19 noncoding regions on the X chromosome (i.e., panel B, 42 African and 48 non- Figure 3.—Schematic of the minimum clade-proportion Africanindividuals;datatobepublishedelsewhere).We statistic, pmc.See textfordetails. estimatedu(¼3N m)foreachdemographicparameter 0 set using the average mutation rate of these 19 loci, the minimum quantity of individuals from a specified 8.33 10(cid:4)10/bp/year(range4.83 10(cid:4)10–1.63 10(cid:4)9),in- geographicregionorethnicgroup(suchasAfricans)in ferred from sequence divergence (assuming a human/ eachofthetwobasalcladesofabinarygenetree(Figure chimpanzeedivergencetimeof6MYA).Furthermore,we 3). (This statistic is described fully in the appendix.) chose to condition our ABC inference on the mean The p statistic has an intuitive interpretation. Con- number of segregating sites, S, that was observed across mc theadditionaldatasetof19Xchromosomeloci(S(cid:2)¼26). siderabinarytreeofAfricanandnon-Africansequences (Figure 3); clades (C , C ) can be defined as the two Thisstatisticwaschosenbecauseitvariesprimarilywith 1 2 basal branches that diverge from the coalescent of all themutationrate(aknownparameter)andtreedepth sampledindividuals.Here,thenumberofAfricanchro- (aparameterwewishedtoinfer).Weselectedthe0.01% mosomecopiesis(k ¼1,k ¼6)andthetotalnumber ofrandomdemographicparametersets,Q,whosecoa- 1 2 ofchromosomecopiesis(n ¼4,n ¼10).FromEqua- lescent data sets produced an average number of seg- 1 2 regating sites closest to the observed value (S(cid:2)¼26). tionA1,theminimumcladeproportionisthelesserof (k =n ¼1; k =n ¼ 6); here, p ¼1: The p statistic MeanvaluesforN0andmweredrawnfromthissubsetof 1 1 4 2 2 10 mc 4 mc was calculated directly from coalescent genealogies demographic parameter sets. These values represent usingcustomsoftware(codeavailableonrequest). best-fit estimatesfor themodern effectivesize, N ,and 0 migration rate, m, of the real-world demography un- derlyingourobserved19Xchromosomeneutral-locus RESULTS genomicdataset. Test statistics: We calculated the approximate like- Patterns of DNA sequence variation within and lihoods of two summary statistics: the TMRCA and the around RRM2P4: We sequenced 2.4 kb of the central minimumproportionofAfricansinoneofthetwobasal RRM2P4 segment, which encompasses the processed clades. Values were inferred from 105 replicates under pseudogene (Figure 1), in 131 African and 122 non- all demographic models. We note that these tests are African individuals. A total of 22 segregating sites were conservative, because both summaries are determined identified,producing23uniquehaplotypes(supplemen- directlyfromcoalescentgenealogies.Resolutionofthe tal Table 1 at http://www.genetics.org/supplemental/). underlying genealogy is constrained by S for real data SimilartothepreviousstudyofGarriganetal.(2005b), sets(cf.Nordborg 2000).Thedistribution ofTMRCA levels of nucleotide diversity are statistically higher in values was extracted from the output of Hudson’s non-Africans(u ¼0.136)thaninAfricans(u ¼0.0768; p p (2002) ms using custom software (code available on P > 0.001; Table 1). Also, the central subregion still request). shows minimal evidence of recombination despite a Here, we also define a new summary statistic, the sixfoldincreaseinthesizeofthedataset(supplemental minimum clade proportion (p ), which characterizes Figure 1 at http://www.genetics.org/supplemental/). mc ArchaicAdmixture ontheX Chromosome 431 TABLE 1 Estimates ofsummarystatistics for the RRM2P4central and flankingregions Lower5% 95% confidence Region Deme n Length S u /bp(%) quantile u /bp(%) interval w p 59 Global 83 1727 5 0.058 — 0.068 (0.052, 0.081) African 37 1727 5 0.069 — 0.084 (0.059, 0.098) Non-African 46 1727 3 0.040 — 0.056 (0.034, 0.073) Central Global 253 2320 22 0.155 0.120 0.110 (0.093, 0.126) African 131 2320 16 0.126 0.071 0.077 (0.066, 0.087) Non-African 122 2320 15 0.120 0.104 0.136 (0.105, 0.163) 39 Global 83 1612 7 0.088 — 0.061 (0.037, 0.081) African 37 1612 4 0.060 — 0.081 (0.044, 0.108) Non-African 46 1612 6 0.085 — 0.041 (0.014, 0.067) Statisticalconfidencewasdeterminedfrom104bootstrapreplicates.SNPdiversity,u ,issignificantlylowerintheflankingre- W gionsthaninthecentralregion(a,0.05).Africanu -valuesdonotdiffersignificantlyamongregions(a.0.05),butnon-African p u -andu -valuesaresignificantlyhigherinthecentralregion.Withinsequencedregions,non-Africanshavesignificantlyhigher p W u relative to Africans in thecentral region only (P >0.001,underlined). Thiseffect is not observed in the 59 (P ¼0.966) or p 39regions(P¼ 0.957). As before, this low level of recombination permitted To determine the length of sequence within and partial reconstruction of a single nonreticulating gene around the RRM2P4 locus that shows the unusual tree for the central region (Figure 4). This genealogy genealogical features, we sequenced two additional has the same two unusual characteristics described by fragments of 1725 and 1601 bp that flank the central Garrigan et al. (2005b): a deep TMRCA and a basal region(Figure1)inasubsetofAfricanandnon-African cladecomposedalmostentirelyofAsiansequences.For individuals (panel B). Levels of nucleotide diversity in example, of the 253 individuals resequenced for the African vs. non-African populations were more similar centralregion,21areintheleftmostdivergentcladein toaveragepatternsforthegenome,withAfricanvalues Figure4(‘‘cladeA’’),and20ofthesearefromEastAsia (u ¼0.0836and0.0805)greaterthannon-Africanvalues p and Oceania. A single Dogon from Mali was the only (u ¼ 0.0556 and 0.0412) in the 59- and 39-flanking p African member of this clade. This haplotype carries a regions, respectively (Table 1). Recombination rates SNP(1639)thatoccurs inbothbasalclades.Anorigin are low (0.43 cM/Mb) for the 59 and central RRM2P4 of this haplotype through recombination rather than subregions, but are substantially elevated (16 cM/Mb) homoplasyismorelikelygiventhelowmutationrateof betweenthecentraland39subregions(supplementalFig- the RRM2P4 locus together with its moderate rate of ure1athttp://www.genetics.org/supplemental/).Thisre- recombination.BecausetheparentalcladeAformwas combinationhotspoteffectivelyunlinksthecentraland found only in Asia and this haplotype is shared with a 39 subregions, which therefore have largely indepen- Melanesian, we suggest that this recombinant lineage dent evolutionary histories. A network representing may have originated in Asia and migrated recently to sequences of the entire 16.4-kb region illustrates the Africa. decouplingofthesetwogenomicregions(supplemental We used both molecular-clock and coalescent ap- Figure 2 at http://www.genetics.org/supplemental/). proaches to estimate the TMCRA of the RRM2P4 cen- The unusual genealogy is less apparent in the 59 frag- tralsubregiongenetree.Outgroupcomparisonsreveal mentdespitesomelinkagedisequilibriumwiththecen- an average of 22 nucleotide substitutions between all tral region. (Recombination rates in these regions are human and chimpanzee central RRM2P4 region se- about one-third the X chromosome average; supple- quences.Givenanaverageof8.53nucleotidedifferences mental Figure 1.) However, only five segregating sites observedbetweenthetwohumanRRM2P4lineages(i.e., were identified in the 59-flanking region, a reduction the average number of mutations between sequences of63% overthecentral region diversity(Table1), and across the base of the human gene tree), we estimate noderivedpolymorphismswereidentifiedinthe59re- thatthetwodeepesthumancladesdiverged(cid:2)2.33MYA gion on chromosomes carrying central region clade A (assuming a 6-MYA human–chimpanzee divergence haplotypes(supplementalTable2athttp://www.genetics. time). We also inferred the TMRCA of the RRM2P4 org/supplemental/). Because the unusual pattern of central subregion using a full maximum-likelihood polymorphism is most apparent in the central region, method under both a panmictic and an island model. we focus further analyses solely on this portion of the These models yielded TMRCA values of 1.24 and 2.88 sequencedregion. MYA,whichbracketthemolecular-clockdate(datanot RAR model parameters: TheRARmodelwasparame- shown). terizedbyABC.Becausethemodelcannotbeparameterized 432 M. P.Coxetal. Figure 4.—Time-scaled gene tree of the RRM2P4 central region. Polymorphisms are labeled with the nomenclature of Garriganetal.(2005b)andareequivalenttothatpublication’sFigure1.Italicizedandunderlinedpolymorphismsindicateprob- ablereticulationpoints.TheproportionsofAfricanandnon-Africanindividualscarryingeachlineageareindicatedbeneaththe tree.Notetheoverrepresentationofnon-Africansintheleftmostbasalclade(‘‘cladeA’’).Molecular-clockestimatesoftheTMRCA andmutation agesareshown in millionsof years onthevertical axis. on the test locus (here, RRM2P4), we conditioned the Both the distribution of TMRCA values and the pro- modelonaseparatetrainingdatasetof19independent portion of African individuals in one of the two basal X chromosome noncoding regions. These resequence clades were inferred under both models. The mean dataaretooextensivefordetaileddescriptionhereand TMRCAunderthetwo-dememodelwas1.15MYA,with arethesubjectofaseparatepublication.However,basic datesexceeding2.12MYAbeingstatisticaloutliers(a¼ summaries of these 19 loci relevant to the current 0.05). While the maximum-likelihood method assum- analysisarepresentedinsupplementalTable3athttp:// ingpanmixiareturnedameanTMRCAvaluesimilarto www.genetics.org/supplemental/. The two-deme RAR thatproducedundertheRARmodel½i.e.,P(TMRCA. modelwasparameterizedbygenerating105randomN 1.24 3 106 j RAR) ¼ 0.350(cid:3), both the molecular clock 0 and m values (see full description in subjects and ½P(TMRCA . 2.33 3 106 j RAR) ¼ 0.030(cid:3) and the methods) and accepting the 0.01% that best matched maximum-likelihoodmethodassuminganislandmodel the number of segregating sites observed in 19 inde- ½P(TMRCA.2.883106jRAR)¼0.008(cid:3)yieldedTMRCA pendentXchromosomenoncodingloci(supplemental valuesthatareunusuallyold.UndertheAAmodel,the Figure 3 at http://www.genetics.org/supplemental/). TMRCAdistributionisshifteddeeperintimerelativeto The optimal parameters were inferred as a modern RAR models, and genealogies exceeding 3 MYA would effective size, N , of 12,300 (range 12,000–12,500) and be expected under our archaic admixture scenario 0 anintercontinentalmigrationratepergeneration,m,of (Figure5,aandb).Themolecular-clock-basedestimate 3.62 3 10(cid:4)9 (range 6.75 3 10(cid:4)10–8.24 3 10(cid:4)9). Using ofthetrueTMRCAisnotanoutlierundertheAAmodel parametervaluesattheextremesoftheserangeshadlittle asspecifiedhere½P(TMRCA.2.333106jAA)¼0.140(cid:3). effect on the following statistical analyses (our unpub- The geographical distribution of lineages on the lisheddata).Importantly,theparameterizedRARmodel central RRM2P4 genealogy is also skewed: one of the producessimulateddatasetsthatyieldsummaries(such two basal clades (Figure 4, clade A) is found infre- as growth rates and effective sizes) that are consistent quently in Africans (0.048) but commonly in non- with other demographic inferences (e.g., from Gene- Africans (0.952). This pattern can also be compared tree).Althoughthereisstrictlynowaytoassessthetrue with the empirical distribution of p , as determined mc history of our samples, this best-fit RAR model is a fromtheadditionaldatasetof19Xchromosomeloci, reasonablefirstapproximationforhumandemographic among which the smallest minimum African clade history,asreconstructedfromtheXchromosome. proportion is 0.214. The likelihood of observing the Assessing the uniqueness of the RRM2P4 genealogy: same, or a more extreme, proportion of Africans (i.e., The uniqueness of the RRM2P4 central region was 1)inabasalcladewasstatisticallysignificantunderthe 21 estimatedusingasimulation-basedsummary-likelihood RARmodel½P(p #0.048jRAR)¼0.031(cid:3),butnotso mc approach under the RAR and AA models (Figure 2). under the AA model ½P(p # 0.048 j AA) ¼ 0.24(cid:3). To mc ArchaicAdmixture ontheX Chromosome 433 Figure5.—Distributionandcumulativeprobabilityof(aandb)TMRCAsand(candd)minimumcladeproportionsunderthe optimaltwo-demeRARmodel(solidcurve)andthecorrespondingmodelwith5%introgressionfromancestralhomininsat50KYA (dashedcurve).Themolecular-clockdateisshownbyasolidverticallineinaandb;thepanmicticandislandmodeldatesareshownby dottedverticallines(totheleftandright,respectively).Theadmixturepeakwouldincreaseasadmixtureoccurredmorefrequently andshiftrightwithdeeperstructurebetweenadmixingdemes.Theobservedp isillustratedbyasolidverticallineincandd. mc furtherexploretheeffectsofarchaicadmixtureonthe 2001; Satta and Takahata 2004). Garrigan et al. minimumcladeproportion,wetooktheoptimizedRAR (2005b)also genotyped asinglediagnosticSNP to test model and incorporated variable rates of admixture for the presence of clade A in a larger number of (0.5–5%)occurringatvariabletimes(10–55KYbefore samples (n ¼ 570 from 17 globally distributed popula- present)andran105simulationsateachpointona103 tions).Theydiscoveredadecreasingfrequencygradient 10 grid of parameter space (Figure 6). Basal clades centered on southern China (where the clade A is dominatedbynon-Africansareobservedmoreoftenon present .50%) and extremely low frequencies of the averageasarchaicadmixturebecomesmorerecentand ‘‘Asian’’ divergent lineage in Europe, the Middle East, morefrequent.Higherratesofadmixturemakeitmore andAfrica.(SeeFigure1inGarriganetal.2005b.)To likely that one of the two basal clades derives entirely explaintheprevalenceofbasalRRM2P4lineagesinEast from the descendants of Asian ancestral hominins Asiatheyfavoredamodelofrecentadmixturebetween (p / 0), and more recent admixture makes it less divergent AMH and H. erectus populations; although mc likely that migrants will carry admixed lineages into they could not rule out founder effects leading to the Africanpopulations(also,p /0). lossofoneofthetwodivergentlineagesinAfrica.Here, mc we resequence this locus in a much larger sample of humans, extend the length of the sequenced region, DISCUSSION and test unusual aspects of the genealogy statistically, Garriganetal.(2005b)describeda2.4-kbregionon usingamodel-basedcoalescentsimulationframework. theXchromosomewithunusualgenealogicalstructure We generated resequencing data 59 and 39 of the in a sample of 41 humans: a deep TMRCA and a basal central2.4-kbregioninapanelof90individualstosee clade composed entirely of Asian (n ¼ 3) sequences. whether the pattern originally described by Garrigan This differs from most genealogies observed to date, et al. (2005b) extended farther along the X chromo- where African individuals dominate at least one of the some. We found that a strong recombination hotspot two basal clades (Labuda et al. 2000; Takahataet al. almostcompletelydecouplesthecentraland39regions, 434 M. P.Coxetal. linkage with a third selected locus located farther 59 unlikely. In sum, it appears that polymorphisms that clearlydefinetheunusualRRM2P4genealogyarefound only within the stretch of noncoding DNA associated withthepseudogene(Figure1)andthatthesesitesare unlikelytobeaffectedbyrecentpositiveselection. Oursixfoldlargerdatabaseofcentralregionsequences doesnotresultinasubstantialchangeinthetopologyof thegenetreedescribedbyGarriganetal.(2005b)orin the partitioning of its two deepest branches. However, wedidobserveseveralnewlineagesatlowfrequencyand discovered some novel evidence of recombination. Of the21individualsidentifiedwiththelessfrequentbasal lineage, only one was from Africa. This Dogon in- dividual from Mali was the same African individual identified as carrying a clade A lineage on the basis of the SNP-based genotyping assay in Garrigan et al. (2005b). Our extended sequence database identified Figure 6.—Effect of admixture parameters on the mini- onlyasinglenewpolymorphismincladeAand11new mumcladeproportion.FewerAfricansarefoundinonebasal polymorphismsinthemorediversegloballydistributed cladeasarchaicAsianadmixtureoccursmorefrequentlyand cladeB(compareFigure1ofGarriganetal.2005bwith morerecently.Thelikelihoodsurfaceisgeneratedbyinterpo- lationfrommeanp valuesof105simulationsateachpoint ourFigure4).Thisbringsthetotalnumberofhaplotypes mc ina10310grid(dots)coveringtheparameterspaceofad- inthedivergentAsiancladeAto3,comparedwithatotal mixture timeandadmixture proportion. of20haplotypesintheotherbasalclade.Interestingly, the new clade A haplotype was identified only in two individuals:theaforementionedDogonindividualand and despite linkage disequilibrium with the 59 end of anindividualfromBougainvilleIsland,Melanesia.This the sequence, the pattern of higher Asian diversity for is consistent with the possibility that recent migration the central region was not found for the other two carriedthisrarelineagebetweencontinents.Although regions. While it is possible that this reduced diversity we cannot exclude the possibility that unsampled may simply reflect the stochastic nature of the muta- Africanpopulationscarrythiscladeathigherfrequency tional process given the relatively short length of se- thanweobservehere,ourgeographicalcoverageofthe quenceexamined((cid:2)2kb),wesuggestthatthelineage Africancontinentisstillquiteextensive. history of the central region has been at least partly Our simulations show that the 2.33-million-year decoupledfromthoseofthe59and39regionsthrough molecular-clock-based TMRCA estimate is a statistical recombination. BLAST results show that the RRM2P4 outlier under the RAR model (Figure 5, a and b), but processed pseudogene has sequence conservation to notunderanAAmodelwithancestralstructuredating the rhesus macaque (Macaca mulatta). Therefore, the from 2 MYA. However, maximum-likelihood estimates highdiversityofthecentralregiondoesnotresultfrom of the TMRCA inferred under different models of increasedmutationatthetimeofpseudogeneinsertion humanpopulationstructurespanawiderangeoftimes. becausehumanvariationtracesbackonlytotheHomo ThelargevarianceinTMRCAestimatesisalsoexpected lineage(i.e.,theTMRCAofhumanhaplotypesismuch because the short length of the central subregion se- more recent than the insertion time). Central region quencelimitsthenumberofsitessegregating between diversity is also unlikely to reflect paralogous gene thetwobasalclades.Unfortunately,thevarianceofTMRCA conversionbecausetheRRM2P4sequencehasnoclose isalwayslargerelativetothemeananddoesnotdecrease matchestootherregionsinthehumangenome.Finally, appreciablywithincreasedsamplesize(Griffithsand increased central region diversity is unlikely to reflect Tavare´ 1994; Tang et al. 2002; Basu and Majumdar linkagetoageneunderlong-termbalancingselection. 2003).Indeed,asweincreasedthesamplesizefrom41 The nearest 59 gene is SPANX-N2, which encodes a (Garrigan et al. 2005b) to 253, we found no new protein of uncharacterized function. SPANX-N2 is 580 polymorphic sites on the basal branches of the gene kb,or(cid:2)1.8cM,distantfromtheRRM2P4pseudogene, tree,whichsuggeststhatwehavesampledsufficientlyto and the intervening region contains at least three hot- observe the basal node of the genealogy for the entire spotsof(cid:2)16cM/Mb(eachsimilartothehotspot39of global population. There is only a remote probability theRRM2P4locus).Whilewecannotexcludethepossi- that we have not observed the deepest split of the bility that an unknown 59 functional variant may be RRM2P4 tree in our data set of 253 individuals (P (cid:5) linked to the RRM2P4 locus, the disparity in diversity 0.0079)(Saundersetal.1984;KlimanandHey1993). between the RRM2P4 59 and central regions makes Our TMRCA estimates, while consistent with the AA ArchaicAdmixture ontheX Chromosome 435 model specified here, are unlikely to be improved by gions depends largely on the amount of admixture samplingadditionalindividuals. between the two archaic populations (Wall 2000). WealsoconsideredtheobservationofGarriganetal. Moreover, unless admixture was recent and involved (2005b)thatAsiansamplesareoverrepresentedinone highly divergent populations, the power to detect ar- ofthetwobasalcladesoftheRRM2P4tree.Thispattern chaicadmixtureislow(Nordborg2000).Inthecaseof continuedtoholdevenafterincreasingthesizeofour RRM2P4, divergence may have started at the time of DNA sequence data set. To assess how unusual this separationofH.ergaster/H.erectuspopulationsinAfrica aspect of the RRM2P4 genealogy is under alternative (cid:2)2MYA(AntonandSwisher2004).Yet,thelengthof modelsofhumanevolutionaryhistory,wedefinedanew the divergent sequence is short, possibly as a result of summarystatistic,p ,whichquantifiestheskewinthe thenearbyrecombinationhotspotorbecauseadmixture mc proportionofindividualsfromtwopopulationsamong occurred in the more distant past and recombination the two basal clades of a gene tree. This summary has subsequently broken down the admixed chromo- statistic is applicable to RRM2P4 because the central some. In any case, identifying longer sequences with region is essentially tree-like. We observed p ¼ 0.048 greater divergence would allow for more sophisticated mc forthecentralregionofRRM2P4,whichissignificantly tests of archaic admixture. For example, Wall (2000) unlikely under the RAR model (P ¼ 0.031, Figure 5, c suggestedanumberofsummarystatisticsthatarebased and d). To examine the sensitivity of the p statistic onboththelevelofdivergencebetweentwocladesand mc under a range of archaic admixture parameters, we theamountofrecombinationbetweenthem. simulated coalescent genealogies and varied both the Recent population structure is another factor that admixture proportion and the timing of introgression may affect the probability of sampling a locus with (Figure6).Althoughgenealogieswithsmallp values a genealogy showing signs of archaic admixture. As mc are more common as the admixture proportion in- pointedoutbyNordborg(2000),populationstructure creases(i.e.,upto(cid:2)5%)andintrogressionbeginsmore mayactuallyincreasethepowertodetectarchaicadmix- recently (i.e., as recently as (cid:2)10 KYA), the RRM2P4 tureifwesamplesufficientlyamongdemes,becausewe genealogyisnotasignificantoutlierunderanyofthese wouldexpecttheintrogressedallelestostillbepresent AAmodelparameterizations.Thisfurthersupportsour in the area of the world where admixture took place. conclusionthattheRRM2P4genealogyfitsexpectations In the case of RRM2P4, individuals carrying the less betterunderascenarioofarchaicadmixture. frequent divergent lineage are concentrated in East WhiletherearelimitationswithbothTMRCAandp Asia, suggesting that admixture may have occurred at mc for distinguishing predictions of the RAR and AA the Asian end of the global distribution of human models, they do represent independent summaries of populations.Ontheotherhand,itisimportanttopoint thedataand,thus,complementoneanother.Asalready out that current population structure is unlikely to re- mentioned, the power of these two test statistics de- flect ancient patterns directly. Following a demic ex- pendsondifferentaspectsofsampling.Varianceinthe pansion,whatwasoncesubdivisionbetweentwoAfrican estimateoftheTMRCAisimprovedbylongersequences populationsmaynowappearasstructurebetweenAfri- of the region with tree-like ancestry (in the case of can and non-African populations. For loci with more RRM2P4thisislimitedbythesmallcentralregionand ancientTMRCA,thereisanincreaseinpowertodetect flankingrecombination),butonlyslightlybyincreasing archaic admixture even if it occurred at more ancient thesamplesize.Ontheotherhand,estimatesofp can times(Nordborg2000).ThismeansthatforRRM2P4, mc beimprovedbyincreasingthesamplesize,becausethe whichhasanancientTMRCA,wecannotbeconfident variance of p decreases approximately as the inverse about where archaic admixture may have occurred mc of the sample size (analyses not shown). If the p is geographically. In this regard, it is interesting to note mc genuinelyanoutlier undertheRAR model,increasing thatagrowingnumberoflocihavebeendiscoveredwith thenumberofindividualssampledincreasesthepower two deeply divergent lineages where both the major to reject RAR. Indeed, when we use the SNP data of and the minor types are present only in African pop- Garrigan et al. (2005b), which included 177 Africans ulations (Barreiro et al. 2005; Garrigan et al. 2005a; and 393 non-Africans, we reject the RAR model with Hayakawa et al. 2006). This supports modelsinwhich greaterconfidence(p ¼0.0189,P¼0.014). anatomicallymodernhumansdescendfromastructured mc Furtherevidenceinsupportofanarchaicadmixture ancestral African population (Garrigan and Hammer modelawaitsanalysisofadditionallociexhibitinggene- 2006). We find some support that elevated admixture alogical properties similar to the central RRM2P4 re- among highly divergent African subpopulations just gion. Several candidates have already been identified prior to the recent African expansion could explain (Hardingetal.1997;Zie˛tkiewiczetal.2003;Stefansson thepatternofpolymorphismatRRM2P4(supplemental etal.2005;Shimadaetal.2007),butmostlackrigorous Figure 4 at http://www.genetics.org/supplemental/), statistical analyses under a range of demographic but note that this model has little power to explain models, including ancient admixture alternatives. The why RRM2P4 clade A lineages are geographically re- frequency at which we expect to find introgressed re- strictedtoEastAsiatoday.Fornow,thislocusrepresents 436 M. P.Coxetal. agenealogicalhistorythatismostconsistentwithrecent Karafet,T.M.,L.P.Osipova,M.A.Gubina,O.L.Posukh,S.L.Zegura admixturefromanarchaichomininpopulationinAsia. etal.,2002 HighlevelsofY-chromosomedifferentiationamong nativeSiberianpopulationsandthegeneticsignatureofaboreal We thank Zahra Mobasher (University of Arizona) for excellent hunter-gathererwayoflife.Hum.Biol.74:761–789. technical assistance and David Morales (University of Arizona) for Kingman,J.F.C.(Editor),1982 OntheGenealogyofLargePopulations. helpfuldiscussion.ThisresearchformspartoftheHOMINIDproject, AppliedProbabilityTrust,Sheffield,UK. Kliman, R. M., and J. Hey, 1993 DNA sequence variation at the agenomicresequencestudyfundedbyNationalScienceFoundation periodlocuswithinandamongspeciesattheDrosophilamelangaster grantBCS-0423670. complex.Genetics133:375–387. Labuda,D.,E.Zie˛tkiewiczandV.Yotova,2000 Archaiclineages inthehistoryofmodernhumans.Genetics156:799–808. LITERATURE CITED McBrearty,S.,andA.S.Brooks,2000 Therevolutionthatwasn’t: Anton, S. C., and C. C. Swisher, 2004 Early dispersal of Homo anewinterpretationoftheoriginofmodernhumanbehavior. J.Hum.Evol.39:453–563. fromAfrica.Annu.Rev.Anthropol.33:271–296. McDougall,I.,F.H.BrownandJ.G.Fleagle,2005 Stratigraphic Barreiro,L.B.,E.Patin,O.Neyrolles,H.M.Cann,B.Gicquel placement and age of modern humans from Kibish, Ethiopia. etal.,2005 Theheritageofpathogenpressuresandancientde- Nature433:733–736. mography in the human innate-immunity CD209/CD209L re- McVean,G.,andC.C.Spencer,2006 Scanningthehumangenome gion.Am.J.Hum.Genet.77:869–886. forsignalsofselection.Curr.Opin.Genet.Dev.16:624–629. Basu,A.,andP.Majumdar,2003 Acomparisonoftwopopularsta- Nordborg,M.,1997 Structuredcoalescentprocessesondifferent tisticalmethodsforestimatingthetimetomostrecentcommon timescales.Genetics146:1501–1514. ancestor(TMRCA)fromasampleofDNAsequences.J.Genet. Nordborg,M.,2000 Ondetectingancientadmixture,pp.123–136 82:7–12. in Genes, Fossils and Behaviour: An Integrated Approach to Human Beaumont,M.A.,W.ZhangandD.J.Balding,2002 Approximate Evolution,editedbyP.Donnelly.IOSPress,Amsterdam. Bayesian computation in population genetics. Genetics 162: Plagnol,V.,andJ.D.Wall,2006 Possibleancestralstructureinhu- 2025–2035. manpopulations.PLoSGenet.2:e105. Cann,H.M.,C.deToma,L.Cazes,M.F.Legrand,V.Moreletal., Satta,Y.,andN.Takahata,2004 Thedistributionoftheancestral 2002 Ahumangenomediversitycell linepanel.Science296: haplotype in finite stepping-stone models with population ex- 261–262. pansion.Mol.Ecol.13:877–886. Eswaran,V.,2002 AdiffusionwaveoutofAfrica:themechanismof Saunders,I.W.,S.Tavare´andG.A.Watterson,1984 Onthege- themodernhumanrevolution?Curr.Anthropol.43:749–774. nealogyof nested subsamples froma haploidpopulation. Adv. Evans,P.D.,N.Mekel-Bobrov,E.J.Vallender,R.R.HudsonandB. Appl.Probab.16:471–491. T.Lahn,2006 Evidencethattheadaptivealleleofthebrainsize Shimada,M.K.,K.Panchapakesan,S.Tishkoff,A.Q.Nato,Jr.and genemicrocephalinintrogressedintoHomosapiensfromanarchaic J. Hey, 2007 Divergent haplotypes and human history as re- Homolineage.Proc.Natl.Acad.Sci.USA103:18178–18183. vealedinaworldwidesurveyofX-linkedDNAsequencevariation. Fenner,J.N.,2005 Cross-culturalestimationofthehumangenera- Mol.Biol.Evol.24:687–698. tion interval for use in genetics-based population divergence Stefansson,H.,A.Helgason,G.Thorleifsson,V.Steinthorsdottir, studies.Am.J.Phys.Anthropol.128:415–423. G.Massonetal.,2005 Acommoninversionunderselectionin Garrigan,D.,andM.F.Hammer,2006 Reconstructinghumanori- Europeans.Nat.Genet.37:129–137. ginsinthegenomicera.Nat.Rev.Genet.7:669–680. Takahata,N.,S.H.LeeandY.Satta,2001 Testingmultiregionality Garrigan,D.,Z.Mobasher,S.B.Kingan,J.A.WilderandM.F. ofmodernhumanorigins.Mol.Biol.Evol.18:172–183. Hammer,2005a Deephaplotypedivergenceandlong-rangelink- Tang,H.,D.O.Siegmund,P.Shen,P.J.OefnerandM.W.Feldman, agedisequilibriumatXp21.1provideevidencethathumansdescend 2002 Frequentistestimationofcoalescencetimesfromnucleo- fromastructuredancestralpopulation.Genetics170:1849–1856. tide sequence data using a tree-based partition. Genetics 161: Garrigan, D., Z. Mobasher, T. Severson, J. A. Wilderand M. F. 447–459. Hammer,2005b EvidenceforarchaicAsianancestryonthehu- Templeton,A.,2002 OutofAfricaagainandagain.Nature416:45– manXchromosome.Mol.Biol.Evol.22:189–192. 51. Griffiths, R. C., 2007 Genetree v. 9.0. http://www.stats.ox.ac.uk/ Wall,J.D.,2000 Detectingancientadmixtureinhumansusingse- (cid:2)griff/software.html. quencepolymorphismdata.Genetics154:1271–1279. Griffiths,R.C.,andS.Tavare´,1994 Simulatingprobabilitydistri- Wall,J.D.,andM.F.Hammer,2006 Archaicadmixtureinthehu- butionsinthecoalescent.Theor.Popul.Biol.46:131–159. mangenome.Curr.Opin.Genet.Dev.16:606–610. Harding,R.M.,S.M.Fullerton,R.C.Griffiths,J.Bond,M.J.Cox YChromosomeConsortium,2002 Anomenclaturesystemforthe etal.,1997 ArchaicAfricanandAsianlineagesinthegenetican- treeofhumanY-chromosomalbinaryhaplogroups.GenomeRes. cestryofmodernhumans.Am.J.Hum.Genet.60:772–789. 12:339–348. Hayakawa, T., I. Aki, A. Varki, Y. Satta and N. Takahata, Zie˛tkiewicz,E.,V.Yotova,D.Gehl,T.Wambach,I.Arrietaetal., 2006 Fixation of the human-specific CMP-N-acetylneuraminic 2003 Haplotypes in the Dystrophin DNA segment point to a acidhydroxylasepseudogeneandimplicationsofhaplotypedi- mosaicoriginofmodernhumandiversity.Am.J.Hum.Genet. versityforhumanevolution.Genetics172:1139–1146. 73:994–1015. Hudson, R. R., 2002 Generating samples under a Wright-Fisher neutralmodelofgeneticvariation.Bioinformatics18:337–338. Communicatingeditor:N.Takahata APPENDIX Here,wedefineanew summarystatistic,theminimumcladeproportion (p ).Initssimplest form,thisstatistic mc characterizestheproportionofindividualsfromaspecifiedgroup(e.g.,Africans)ineachofthetwobasalcladesofa binarygenetree.Theminimumcladeproportioncanthusbedefinedas (cid:2) (cid:3) k k p ¼min 1; 2 ; ðA1Þ mc n n 1 2
Description: