ebook img

Estimation of hominoid ancestral population sizes under Bayesian coalescent models ... PDF

16 Pages·2008·0.23 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Estimation of hominoid ancestral population sizes under Bayesian coalescent models ...

Estimation of Hominoid Ancestral Population Sizes under Bayesian Coalescent Models Incorporating Mutation Rate Variation and Sequencing Errors Ralph Burgess*(cid:1) and Ziheng Yang*(cid:2) *GaltonLaboratory,DepartmentofBiology,UniversityCollegeLondon,London,UnitedKingdom;(cid:1)WatsonSchoolofBiological Sciences,ColdSpringHarborLaboratory,ColdSpringHarbor,NY;and(cid:2)LaboratoryofBiometrics,GraduateSchoolofAgriculture andLifeSciences, University of Tokyo,Tokyo,Japan Estimation of population parameters for the common ancestors of humans and the great apes is important in understandingourevolutionaryhistory.Inparticular,inferenceofpopulationsizeforthehuman–chimpanzeecommon ancestor may shed light on the process by which the 2 species separated and on whether the human population experienced a severe size reduction in its early evolutionary history. In this study, the Bayesian method of ancestral inferenceofRannalaandYang(2003.Bayesestimationofspeciesdivergencetimesandancestralpopulationsizesusing DNA sequences from multiple loci. Genetics. 164:1645–1656) was extended to accommodate variable mutation rates amonglociandrandomspecies-specificsequencingerrors.Themodelwasappliedtoanalyzeagenome-widedatasetof ;15,000neutralloci(7.4Mb)alignedforhuman,chimpanzee,gorilla,orangutan,andmacaque.Weobtainedrobustand precise estimates for effective population sizes along the hominoid lineage extending back ;30 Myr to the cercopithecoid divergence. Theresultsshowed thatancestral populations were5–10timeslarger thanmodern humans along the entire hominoid lineage. The estimates were robust to the priors used and to model assumptions about recombination.TheunusuallylowXchromosomedivergencebetweenhumanandchimpanzeecouldnotbeexplainedby variation in the male mutation bias or by current models of hybridization and introgression. Instead, our parameter estimates were consistent with a simple instantaneous process for human–chimpanzee speciation but showed a major reductioninXchromosomeeffectivepopulationsizepeculiartothehuman–chimpanzeecommonancestor,possiblydue toselective sweeps onthe Xprior toseparation of the 2 species. Introduction 1991). Takahata (1993) has argued that the diversity and deepcoalescenceofMHCallelesimplyN(cid:1)100,000over TheeffectivesizeNofapopulationisdirectlyrelated this longer timescale. tothegeneticdiversitythatismaintainedinthepopulation Amoreancientdemographichistoryofthehumanlin- (KimuraandCrow1964).Analysesofpolymorphismdata eagemaybeinferredthroughcomparisonwiththegenomes haveconsistentlyestimatedN(cid:1)10,000forthemodernhu- of other primates. The evolutionary distance between or- manlineage(e.g.,NeiandGraur1984;Takahata1993;Yu thologous sequences from 2 species, such as human (H) etal.2001).Furthermore,astudyofgenome-widelinkage and chimpanzee (C),isattributable to 2time components: disequilibrium, using an independent method, suggested the speciation time s , which is common to all loci, and that our recent population size was 2–3 times smaller HC thecoalescenttimeintheHCancestralpopulation,whichis (Tenesa et al. 2007). Although it is well recognized that variableamongloci.Calendartimesarescaledintoevolu- effectivepopulationsizemaybemuchsmallerthancensus tionarydistances,withs 5 T l,whereT isthetime population size, these estimates are still surprisingly low, HC HC HC and there is considerable interest in determining possible inyearstohuman–chimpanzeespeciationandlisthemu- population bottlenecks in human evolutionary history. As tation rate, measured by the number of mutations per site for other modern hominoids, studies of polymorphism in per year. Coalescence rate and variance are determined nuclear noncoding regions estimated chimpanzee popula- solely by the scaled ancestral population size hHC 5 tion size to be N (cid:1) 21,000 and gorilla N (cid:1) 25,000 (Yu 4NHCgl,wheregisthegenerationtimeinyears.Withasin- etal.2001,2004;seealsoKaessmannetal.2001).Limited gle locus, one cannot distinguish recent speciation with datafortheorangutansuggestedhigherdiversity,butanex- alargeancestralpopulation(smalls andlargeh )from HC HC cessofintermediate-frequencyallelesimplicatedpopulation ancient speciation with a small ancestral population (large subdivision rather than greater size (Fischer et al. 2006). s andsmallh ).However,Takahata(1986)pointedout HC HC UndertheFisher–Wrightmodel,theexpectedtimefor thatdatafrommultiplelocicontaininformationabouth HC alargesampleofallelesataneutrallocustofindtheirmost in the variation of observed divergences among loci, en- recentcommonancestoris;4Ngenerations,withstandard abling us to estimate s and h jointly. Such inference HC HC deviation (SD) at ;2.15N (e.g., Tajima 1983). Therefore, is nevertheless sensitive to mutation rate variation among withagenerationtimeof;20yearsandaneffectivepop- loci (Yang 1997b). ulation size N (cid:1) 10,000, human diversity at neutral loci Withdatafor3ormorecloselyrelated species,addi- contains no demographic information much beyond 1.5 tional information is provided by incomplete lineage sort- Myr. However, at the major histocompatibility complex ing.Considerthecaseofhuman(H),chimpanzee(C),and (MHC),balancingselectionhasextendedthemeancoales- gorilla (G), where the species tree is ((HC)G) (see fig. 1). cencetimeforhumanDRB1allelesto;29Myr(Sattaetal. CoalescenceofHandCallelesmayoccurintheinternode ancestral population HC, in which case the gene tree Keywords:hominoid,ancestralpopulationsize,Bayesianinference, matches the species tree. Otherwise, if all 3 alleles enter MCMC,coalescentmodel,sequencingerrors. the HCG ancestral population, they may coalesce in any E-mail:[email protected]. orderwithequalprobability,with2ofthe3possiblegene Mol.Biol.Evol.25(9):1979–1994.2008 treesinconflictwiththespeciestree.Theprobabilityofspe- doi:10.1093/molbev/msn148 AdvanceAccesspublicationJuly4,2008 cies tree–gene tree mismatch (PSG) is thus two-thirds the (cid:1)TheAuthor2008.PublishedbyOxfordUniversityPressonbehalfof theSocietyforMolecularBiologyandEvolution.Allrightsreserved. Forpermissions,pleasee-mail:[email protected] 1980 BurgessandYang from multiple species, but few comparative primate data sets were available in 2003. Inthisstudy,weusedMCMCCOALforBayesianco- alescent analysis of a genome-wide data set of ;15,000 neutral loci (7.4 Mb) from 5 primate species (Patterson et al. 2006). We updated the alignments to incorporate thehigh-qualitygenomeassemblysequencenowavailable forchimpanzeeandmacaqueandtrimmed theerror-prone endsofshotgunsequencingreads,leadingtoimproveddata quality. The basic model in MCMCCOAL (Rannala and Yang2003)wasextendedtoaccommodatevariablemuta- tionratesamonglociandtoallowforspecies-specificran- domsequencingerrors.Wediscusstheimplicationsofour parameter estimates for the speciation process between human and chimpanzee. Materials and Methods Sequence Data We retrieved the alignments of human, chimpanzee, gorilla, orangutan, and macaque (HCGOM) of Patterson etal.(2006).Thosedata weregorilla whole-genome shot- FIG.1.—Thespeciestreeforhuman(H),chimpanzee(C),gorilla(G), gunreadsalignedtohumanassemblysequence,tomultiple orangutan(O),andmacaque(M)showingtheparametersinthemodel.For preassemblyreadsofchimpanzeeandmacaque,andtolow- eachancestralpopulation(referredtoasHC,HCG,HCGO,HCGOM),2 coverageorangutanreads.Wecombinedoverlappingreads parametersaredefined:h54Nlgands5Tl,whereNistheeffective into a single consensus sequence for each species and populationsize,listhemutationratepersiteperyear,gisthegeneration extracted 50,321 segments with at least 300 sites aligned time,andTisthespeciesdivergencetime.SequencingerrorsinC,G,andO aremodeledbyexcessbranchlengthse ,e ,ande . forallspeciespresentinthealignment(table1,rowa).With C G O lowerreadcoveragefororangutaninthepublicdatabases, 38% of segments included only the 4 species HCGM. In probabilitythatnocoalescenceoccursintheancestralpop- most cases, segment ends corresponded to the end of a ulation HC gorilla shotgun read, or less often an orangutan read, with mean length ;700 sites. 2 P 5 e(cid:2)2ðsHCG(cid:2)sHCÞ=hHC ð1Þ Weimprovedthequalityofthesequencedatabyincor- SG 3 poratingthemostrecentgenomeassemblydataforchimpan- (Hudson 1983). This probability is greater for larger h zee(panTro2,WashUbuild2.1,October2005)andmacaque HC andshorterinternodetime(s (cid:2)s ).Inrealdatasets, (rheMac2,Baylorbuild1.0,January2006).Humanlociwere HCG HC thetruegenetreesareunknown,andacommonstrategyis identifiedintheUniversityofCalifornia,SantaCruz(UCSC) the so-called ‘‘tree-mismatch’’ or ‘‘trichotomy’’ method Genome Bioinformatics pairwise BlastZ alignments of the (e.g., Chen and Li 2001), by which the theoretical proba- mostrecenthumanassemblyhg18(NCBIbuild36.1,March bility (P ) is equated to the proportion (P ) of mis- 2006)againstchimpanzeeandmacaque.Fourpercentofthe SG SE matches between the species tree and the estimated gene alignmentswerediscardedbecausethelocimappedtobreaks tree.However,Yang(2002)pointedoutthaterrorsingene intheUCSCalignmentsforHCorHM.Thesequenceswere treereconstructioninflatethemismatchprobabilitysothat realigned using MUSCLE (Edgar 2004). Default settings P isalwaysgreater thanP ,andthismethodmayseri- were used, except for a reduced gap-opening penalty of SE SG ously overestimate ancestral population size h (see also 250,foundempiricallytoreducethenumberofpooralign- HC below). ments.Inasmallnumberofcases(144,0.3%),substituting Analternativetothetree-mismatchmethodisthefull UCSC-alignedchimpanzeeandmacaquesequencesandre- likelihood approach, including both maximum likelihood aligningresultedinmorethandoublingofatleastonepair- (Takahata et al. 1995; Yang 2002) and Bayesian methods wisedistance.Theselociwerediscardedaseithertheoriginal (RannalaandYang2003).Thelikelihoodfunctionaccom- alignmentsortheUCSCalignmentsmayhavebeenspurious. modatesuncertaintyinthegenetrees,weightingeachpos- Alignmentendscorrespondedtotheerror-proneends sible tree by its probability of occurrence. Another ofGorOshotgunsequencingreads(seeResults),sothey advantageofthelikelihood-basedapproachisthat ittakes weretruncatedby100sites.(Truncatingmoresitesshowed account of the branch lengths in the trees, which provide negligible further improvement in quality.) This approach information about gene coalescent times. The maximum was preferred to the use of PHRED scores for individual likelihood calculation involves multidimensional integrals base calls because our model required continuous align- and is practical for small data sets only. The Bayesian ments (without internal gaps). Furthermore, the use of MarkovchainMonteCarlo(MCMC)algorithm,asimple- PHRED thresholds may introduce bias (Johnson and mentedintheMCMCCOALprogram(Yang2002;Rannala Slatkin 2008), and they do not account for all sources of andYang2003),wasfeasibleforanalyzinglargedatasets sequencing error. We preferred a conservative curation HominoidAncestralPopulationSizes 1981 M 04443514432 toavoidtheriskofbiasoroverfiltering,whichmayremove H 4.4.4.4.4.4.4.5.4.4.4. genuinelyvariablesites.Ourdatasetstatistics(seeResults) j O 80722319220 showed that we obtained high-quality continuous align- a’s H 3.4.3.4.4.4.4.4.4.4.4. ments of ;500 bp. Residual sequencing errors were dealt Kimur HG 3.43.73.04.34.34.44.24.94.44.34.1 withCunodmeprlethteewexastetnhdeerdefimnoeddeiln.clusiveautosomaldataset, fromwhich2datasetsofneutrallyevolvinglociwerecom- C 63232411242 H 3.4.4.4.4.4.4.5.4.4.4. piled.ThefirstwascalledNeutral.StartingwithComplete, weremovedlociwithin1kbofUCSCknowngenes(Hsu 24128865058 M 74734838854 etal.2006). Wealso removedsome lociidentified byRe- 666666676654 O 000000000008 0.0.0.0.0.0.0.0.0.0.0.0. peatMasker (Smit AFA, Hubley R, and Green P, unpub- lished data), including noncoding RNA genes, simple 98658737368 M 74023727743 sequence repeats, and low-complexity regions. However, 667666676654 G 0.00.00.00.00.00.00.00.00.00.00.00.8 we masked only a small subset of transposable elements (TEs) in families that were potentially active in the past 24647746079 M 51112616632 40 Myr (see below). We also masked segmental duplica- 666666676654 C 0.00.00.00.00.00.00.00.00.00.00.00.8 tions identified by the UCSC browser table for hg18. Re- maining loci were uncharacterized but assumed to be M 3513131225641162573329 neutrally evolving. From a separate curated data set Com- 666666676655 H 0.00.00.00.00.00.00.00.00.00.00.00.8 pleteX,thesameprocedureswereappliedtoX-linkedloci es to compile the Neutral X data set. c stan O 3943804623473593723444213713522860 The second autosomal data set was called Inert TE. Di G 0.00.00.00.00.00.00.00.00.00.00.00.8 Here, in contrast to the conventional masking approach 69 (above) that assumes uncharacterized loci are neutral, we C wiseJ CO 0.03690.03470.03750.03360.03480.03600.03340.04080.03580.03400.02770.80 ploogseitniveteilcyeivdiednetnicfieedfowrneellu-ctrhaalirtayc.teArilzaerdgeTpErolpoocritiwointhofpphryi-- air mate genomesisrecognizable transposons, mostof which P O 3533453733343463573314063543382780 havebeenassignedtowell-characterizedfamiliesforwhich H 000000000008 detailed dated phylogenies are available (see below). We 0.0.0.0.0.0.0.0.0.0.0.0. selected elements, identified by RepeatMasker, known to 09150424768 have been inert evolutionary fossils for at least 40 Myr. G 21182516171716191716131 C 000000000008 In practice, there was considerable overlap between the 0.0.0.0.0.0.0.0.0.0.0.0. NeutralandInertTEdatasets.Themajordifferenceswere 57838192457 thatInertTEincludedintronictransposonsthatwereabsent G 19182416161715191716132 H 000000000008 fromNeutral,andthatNeutralincludeduncharacterizedin- 0.0.0.0.0.0.0.0.0.0.0.0. tergenicuniquesequence.InertTEwaspartitionedintothe 62317802148 4 major primate TE classes as follows: HC 01301201201201201201201401301200870 0.0.0.0.0.0.0.0.0.0.0.0. Longinterspersed nuclearelements(LINEs): Theyoung LINE families L1H, L1P1-3, and L1PA1-7 were bp) recently active and were removed; older L1Ps and all ( e L1MshavebeeninertsinceOldWorldmonkey(OWM) z 24048135277 Si 7069204950323121292347 divergence and were included (Khan et al. 2006). The n ea Lyon repeat hypothesis (Lyon 1998) proposed a func- M tionalroleforLINEsontheXchromosome,butexclusion oci bold. ofX-enrichedL1elements(Baileyetal.2000)fromour L n data made little difference to the results, and they were mberof 50,32148,467 48,46714,66319,1929,8779,0414,2132,810783 hlightedi Shionrctliundteerdspinertsheedrnesuuclltesasrheolewmne.nts (SINEs): Progressive Nu hig enrichment of Alu SINEs in GC-rich regions has led to are speculationaboutafunctionalrole(Landeretal.2001),but Table1DataSetStatistics DataSet Datarefinement(a)Uncuratedrawalignments(b)IncorporateC,Massembly(c)2100-bpends(removed)(cid:3)(d)CompleteNeutralInertTELINESINELTRretrotransposonDNAtransposonNeutralX/dd(X)(A) Resultsfortheprincipaldatasets LotaacAAimdnhnpgaogcletoupaugecniserJeaotneosa,enbrspmrrdsSotsmeeiouxernAntsfirs1goncesl,luead(yreral–Qe,unronAtelrroufrdepieolnecguepvnSlhoaeeyitmgunterianu1tt(ectehMsttn1aeiowe(clt9LIpslyeR9eivTorcl(2)a(eiRuBEc)r(rn.iLi)aeRangstcatzceVrionAeoelluemdstrnlrc)deuobtreeiYatipodvernnanr,teadfonn;aatabislmsop.piaDnn2uibowlr0elysyieiyi0ofsnorwy1neiuit)dnsanhnm:geasgegrneaeirdvespmxTelerctehlmoh2lemeueb0cfadare0tAloeieibn2oanmldlt)nnues-;. 1982 BurgessandYang retroviruses, which lost infectivity but retained the e , e , and e to the lengths of branches leading to C, G, C G O ability to transpose within the host genome. However, andOinthegenetrees(fig.1).Asthebranchlengthismea- the evolutionary dynamics of moderate and low-copy suredasthenumberofchangespersite,theeshererepre- ERV families are often uncertain and may include sent the error rate per base pair. Each of e , e , and e is C G O reinfection (Bannert and Kurth 2006). Accordingly, we assigned the gamma prior G(1, 1000), with mean 0.0010 adopted a conservative approach, including only the and95%credibilityinterval(CI)0.0003–0.0037.Notethat well-characterized ancient nonautonomous mammalian thegammadistributionG(a,b)hasmeana/bandvariance LTR retrotransposons (MaLR) and LTR class III (or a/b2.Posteriordistributionsofe ,e ,ande aregenerated C G O ERV-L),whichareinertinhigherprimates(Smit1993; by the MCMC algorithm. Because accommodating errors Lander et al. 2001). inM(bytheuseofparametere )wouldcreateanidentifi- M DNA transposons: There is no evidence of DNA trans- abilityproblem,e isnotestimated.Insomeanalyses,e is M M poson activity in the past 50 Myr, so all were included assignedafixedvaluetomodelahighermutationratespe- (Lander et al. 2001). cific to the macaque lineage, as proposed in the hominoid slowdown hypothesis. An assumption of the model is free recombination Ourmodel assumes that errors affect sites andloci at between loci. There is at present no definitive biological random,atthesamerateforthewholespecies.Iftheerror model for patterns of recombination over evolutionary rate varies overgenomicregions(e.g., due todifferent se- time. Recombination appears to be concentrated at poorly quencing coverage), it may be more realistic to let e vary characterized hot spots, which are not conserved even among loci according to a prior. This is not pursued here. between humans and chimpanzees (Ptak et al. 2005). We calculated an order-of-magnitude estimate of therecombi- nationratebyconsideringthetotallengthofthehumange- Variable Mutation Rates among Loci nome linkage map, ;3000 cM (Kong et al. 2004), across the3(cid:3)109bpgenome,implyingameanrateofc(cid:1)10(cid:2)8 We implement 2 methods to accommodate variable mutation rates among loci. In the first, fixed-rates model, perbasepair pergeneration. Withagenerationtime of15 we estimate the relative mutation rate for each locus by years,thisamountsto1crossoverper1.5kbperMyr.Thus, theaverageJC69distancesbetweenmacaqueandthe4apes: a minimum separation of 10 kb between loci seemed ade- quatetoapproximatetheassumptionoffreerecombination dHCGO-M5ðdHM þ dCM þ dGM þ dOMÞ=4: ð2Þ overthetimescaleofhominoidandOWMevolution.Loci were culled accordingly as the final step in preparation of Thisaveragedistanceisscaledsuchthatthemeanacrossall the Neutral and Inert TE data sets. lociis1,andtheresultingrelativeratesareusedtoanalyze data from 4 species only (HCGO) (Yang 2002). The second model, referred to as the random-rates Bayesian Inference model, assumes random rate variation among loci. Let We use the Bayesian method of Rannala and Yang the rate for locus i be r, with i 5 1, 2, ..., L. To avoid i (2003; see also Yang 2002), implemented in the overparameterization, the average rate is fixed at one: MCMCCOAL program. The method can be used to an- (cid:1)r5PL r=L51. Parameters hs and ss are then defined i51 i alyze sequence data from multiple loci from several using (cid:1)r. We assign a Dirichlet prior on the transformed closelyrelatedspecies,accountingforthespeciesphylog- variables y 5 r/L, i 5 1, 2, ..., L. i i enyandrandomcoalescenteventsinextantandancestral species. The JC69 mutation model (Jukes and Cantor CðLaÞ YL XL fðy ;y ;...;y jaÞ5 ya(cid:2)1; y.0; y 51: 1969) is used for its computational efficiency and for 1 2 L ½CðaÞ(cid:4)L i i i the well-known robustness of analysis of highly similar i51 i51 sequences to the assumed mutation model. Here the role ð3Þ ofthemutationmodelistocorrectformultiplehitstoes- The marginal mean and variance are E(y) 5 1/L and timatethegenetreetopologyandbranchlengths.Inpre- i var(y) 5 (L (cid:2) 1)/[L2(La þ 1)]. Thus, the prior density vious studies, parsimony and neighbor-joining produced i on the rates, r 5 Ly, i 5 1, 2, ..., L, is similar trees, and within the apes, even the infinite sites i i model produced very similar estimates to the finite-sites L model (Satta et al. 2004). More complex models such as fðr1;r2;...;rLjaÞ 5 CCððLaaÞLÞ Qðri=LÞa(cid:2)1(cid:3) L1L HKYþC (Hasegawa et al. 1985; Yang 1994) are thus i51 deemed unnecessary in such analysis. The basic model 5 CðLaÞ YL ra(cid:2)1; r.0; assumesneutralevolutionataconstantmutationrate,free ½LaCðaÞ(cid:4)L i i i51 recombinationbetweenloci,andnorecombinationwithin L P a locus. Several extensions to the basic model are intro- ri5L: duced in this study. i51 ð4Þ Model of Sequencing Errors and Violation of the WehaveE(r) 5 1,var(r) 5 (L (cid:2) 1)/(La þ 1)(cid:1)1/afor Molecular Clock i i large L, and corr(r, r) 5 1/(L (cid:2) 1). Parameter a is thus i j The human sequences are assumed to be error free. inverselyrelatedtotheextentofratevariationamongloci, Sequencing errors in C, G, and O are modeled by adding andtheimpactofthepriorcanbeassessedbychanginga. HominoidAncestralPopulationSizes 1983 Note that both equations (3) and (4) are (L (cid:2) 1)- pliedtotheNeutraldataset,toexaminetheimpactofmodel dimensional densities. The first (L (cid:2) 1) rates are working assumptions about neutrality, mutation rate variation, and variablesintheMCMC,withthelastrater 5L(cid:2)PL(cid:2)1r. recombinationandtoassessthesensitivityofposteriores- L i51 i Aslidingwindowisusedtoupdater,i 5 1,2,...,L (cid:2) 1. timates to the prior. We describe these results first, which The new rate is generated as r(cid:5);Ui (cid:1)r (cid:2)e(cid:2)2;r þe(cid:2)2(cid:3), alsoservetoidentifythebestmodelforourdata.Wethen i i i reflected into the feasible range (0, r þ r ) if necessary, presentanddiscussparameterestimatesobtainedunderthe withr(cid:5)5r (cid:2)(cid:1)r(cid:5)(cid:2)r(cid:3).TheproposalriatioiLsone.Theprior bestmodelfromtheComplete,InertTE,andNeutralXdata L L i i ratio of the move is, according to equation (4), sets. Underthebasicmodel(RannalaandYang2003),we f(cid:1)r1;r2;...;ri(cid:5);...;rL(cid:5)(cid:4)(cid:4)a(cid:3) 5(cid:5)ri(cid:5)rL(cid:5)(cid:6)a(cid:2)1: ð5Þ assignedthesamediffusegammapriorG(2,500)tothe4h fðr ;r ;...;r;...;r jaÞ rr parameters,withmean0.0040and95%CI0.0005–0.0111. 1 2 i L i L Diffusegammapriorswerealsoassignedtothesparame- ters: G(4, 606) for s , G(4, 465) for s , G(4, 219) for HC HCG Implementation of Bayesian MCMC Algorithm sHCGO, and G(4, 131) for sHCGOM, with means 0.0066, 0.0086, 0.01826, and 0.0305, respectively. Those prior Foreachanalysis,werantheMCMCCOALalgorithm means were calculated using the species divergence times at least 3 times to confirm convergence. The MCMC was of Steiper and Young (2006) and the mutation rate 10(cid:2)9 foundtohavegoodconvergenceandmixingproperties.For changes per site per year. analysesunderdifferentmodels,weusedaburn-inof5,000 The posterior means of the parameters are shown in iterations and then took 25,000 samples at every second table 2, row a. The 95% equal-tail CIs are narrow around iteration. The SD across the 3 runs was ,0.5% of the the mean, indicating the information content of this large posteriormeanofthesameparameter.Forthemajorresults, data set. At theposterior mean parameter values, the gene weranlongerchains,with10,000iterationsforburn-inthen tree for H,C, and Gis expectedto differ from the species 100,000samples.Therewasnoapparentdifferencebetween treeatP 5 29%oftheloci(eq.1).Thisislowerthanthe SG the long and short runs. P 5 40% in the phylogenetic analysis of a similar data SE set by Ebersberger et al. (2007), but as discussed earlier, P is an overestimate of P due to tree reconstruction SE SG errors.Ebersbergeretal.thenconsideredonlylociatwhich Results the approximate posterior probability for the gene tree Data Compilation and Quality exceeded 0.95, obtaining the mismatch probability 0.23. Table1(rowsa–d)showsourupdatingandrefinement This is a serious underestimate because conflicting gene of the HCGOM alignments of Patterson et al. (2006), by treestendtohaveshorterinternalbranchlengthsandweak- incorporating recent high-quality assembly sequence for ersupportthanmatchinggenetrees,andapplicationofthe C and M and by trimming 200 bp at the error-prone ends cutoff must have disproportionally removed conflicting of the alignments. Improvement of data quality was indi- gene trees. Our estimate, lying between 0.23 and 0.40, cated by reduction in pairwise distance estimates, better is thus qualitatively consistent with the estimates of conformity with the molecular clock, and approximately Ebersberger et al. equal estimates of the transition/transversion rate ratio j (a/b in the notation of Kimura 1980) for all species pairs. Random Sequencing Errors or Violation of the Clock? Itwasapparentthatsequencingerrorsshowalowerjthan mutations.OurcuratedHCdistanceestimate(0.0121)was Afterdatarefinement,theNeutraldatasetstillshowed almost identical to the genome-wide estimate provided by some branch length variation. By comparison with the H the Chimpanzee Sequencing and Analysis Consortium lineage and with reference to outgroup M, excess branch (0.0123) (Mikkelsen et al. 2005). lengths were 0.0002 (50.0627 – 0.0625) in C, 0.0013 in Thiscurationprocessyieldedtheinclusiveautosomal G, and 0.0023 in O (table 1). We analyzed the excess datasetComplete.Fromit,weselectedourprincipalauto- branchlengthsbycountingsitepatternsforspeciestriplets somaldatasetNeutralbytheconventionalmethodofmask- (table3).Theproportionsofsiteswithtransitional(S)and ingfunctionalregions.(FortheXchromosome,aseparate transversional (V) differences were counted on each line- NeutralXdatasetwascompiledbythesamemethod.)Our age. We subtracted the counts for H from the counts for secondautosomaldataset,InertTE,containsonlyTEsthat each of C, G, and O to obtain excess proportions S and are well characterized and known to have been inert evo- VfortheC,G,andOlineages.Thenjwasestimatedunder lutionary fossils for at least 40 Myr. We also analyzed 4 the K80 model partitionsofInertTEcorrespondingtothe4majorclasses jˆ52 (cid:3) logð1 (cid:2) 2S (cid:2) VÞ=logð1 (cid:2) 2VÞ (cid:2) 1 ð6Þ of primate TE: LINEs, SINEs, LTR retrotransposons, and DNA transposons. (Kimura 1980; Jukes 1987). This is close to 2S/V when S andVaresmall.Thehumansequencewasassumedtohave noerrors,providingagoodestimateofjduetoevolution, Analysis of the Neutral Data Set at4.6.IfalltheexcessinCwasduetosequencingerrors,j Basic Model for sequencing errors was ;1.6. Accordingly, the j esti- BoththebasicmodelofRannalaandYang(2003)and mates suggest that the excess for Oðjˆ52:2Þ was mainly the extended models implemented in this article were ap- duetosequencingerrors,whereastheexcessforGðjˆ53:6Þ 1984 BurgessandYang PSG 0.290.310.330.310.31 C,and waragsummeonret rceolinesdisotenntthweitahssaumhipgthioernetvhoaltuetivoonlaurtyiornataen.dThseis- –1.8) –1.9)–1.8) forH, qoufetnhceinfigniesrhroedrsghoarvielladiastnindcotrjanragtuiotasn.Tgheenoevmeentausaslermelbelaieses eO 1.7(1.5Fixed1.7(1.61.7(1.5 probability mmoaydarPeteavrbeaarmalntehcteehrtslreueneCge,txhetGve,anratinoadfticoelnoOcfwokrevCrieo,lGiant,tiroaonnddiunOctehddeusteeotsoapceecicotihmeesr-. –1.2)–1.7)–1.2)–1.2) match species-specificsequencingerrorsorviolationoftheclock eG 1.01.51.01.0 mis (fig.1).AgammapriorG(1,1000),withmean0.0010and (((( e 95%CI0.0003–0.0037,wasassignedoneache.Thepos- 1.11.61.11.1 netre teriormeanswere0.0002foreC,0.0011foreG,and0.0017 C 1–0.3)5–0.6)1–0.3)1–0.3) tree–ge lfeonrgetOhs(toabbtlaein2e)d. Tbhyessiemwpleeredissitmanilcaer ctoalcthuelaetixocnesss(0b.0ra0n0c2h, e 0.2(0.0.6(0.0.2(0.0.2(0. species 0.001T3h,eanindtr0o.d0u0c2t3io,nreosfpeectpivarealym;esteeersahbaodvemainnidmtaalbleeff1e)c.t OM 2–26.5)8–26.1) 0–29.5)2–24.6) isthePSG oe(trnabealssetiem2x)a.cteeAssssofbehxraspn.ecTchtheedled,nievgsethragsnednwcetehreteimacecoscrro(esmsspm)oboneddcaiantmegdessbsmyhaalelds- sHCG (26.(25. (29.(24. used. strongnegativecorrelations(table4).Furthermore,thepriors 26.426.0NA29.224.4 asnot ossn(essuphpaldemviertnutaalrlyytnaobleefSfe1c,tSounpppolesmteerinotrareystMimaatteersiaolfohnslinane)d. et 4.7)3.8)3.8)4.5)4.6) ewM Overall,theeffectsofaccommodatingsequencingerrorsin NeutralfortheDataS ssHCGHCGO 6.7(6.7–6.8)14.6(14.5–16.2(6.1–6.3)13.7(13.6–15.9(5.8–6.0)13.6(13.5–16.3(6.2–6.4)14.3(14.2–16.3(6.2–6.4)14.5(14.3–1 fixed,whereasinothermodels, tittsThhnhaehbeeaqetlnuhmeudereTneno1scshccd,uuiueeenrlrrslotgaamstwwtiteemeowreddrdaraeoee.tdrdrelesaTmasatotshaaisdfhni.wsisoofsefweerdet,lq,alnrputeasaaseidtnarnssttcpcleityosuaitntnbribwigscloeeisaudces5saefro.ruarkoarlsUnibregwsonlnoywdhwmfeifncaterrhohsnttemohtaswell(ioseesnbwoesrcaetelisaeumMisprdchrapeeoomtltrehiwmesoordnadadotnesebit)lyons-,. ntModels sHC (4.0–4.2)(3.7–3.9)(3.5–3.8)(3.8–4.0)(3.8–4.0) 50.0082was tew(acxieocnrmeeepdpgtarftrerheoaetmwlmyittohrhesettdaudcbciulseertdaa2tne)ut.dnaAdnNescereeustxthtrpreaaellcmptdeooaddpt,aeutlhlsaoeetftipo.sonHes,qotteuhwreeienovdcrieienfrfsg,etirfmeeornrracotaeerlsssl re 4.13.83.73.93.9 M of error rates (es) were much higher than for Neutral. Be- e e underDiff hHCGOM (11.5–12.2)(11.5–12.2) (4.8–5.8)(6.4–7.2) oidslowdown, cimtthaeousedsceeaumtetrheaeatdenmdlyikodredealemtyalatphisenearittfnotgahrnmeldoemwdthoewadrteelletllhvweoeolniunotlfhfdeisrsseeuunqcncuceceeunswrcsafiatnuesgldlyreodrabarcutoacsrostsmeittno-, eters 11.811.8NA5.36.8 Homin acaspmedallthneucmubraetrioonf psproucrieosus.s alignments that may have es- parentheses)ofParam hhHCGHCGO 3.4(3.3–3.5)6.1(5.9–6.4)3.3(3.2–3.4)6.1(5.8–6.3)3.5(3.4–3.6)6.1(5.8–6.4)3.3(3.2–3.4)4.9(4.7–5.2)3.2(3.2–3.3)4.7(4.4–5.0) 3ametersarescaledby10.In(e)equation(1). tTCDihshoie2venNemrterrNggfoioboeedrunnteeetelc,irteoithashntaeigtooiatnfvhvseeAe,nraneavqcbgueeyersivatHrgaa–leelCcnPotdoatiollvyeeamsrcdgoeiernsnpttachtneiimscdmeeHobCtfoeehtxwH/2pe–(eeCc5nte2d2aNullngedllee)sr. Table2PosteriorMeansand95%CIs(in hHC (a)Basicmodel6.5(6.0–6.9)(b)Sequencingerrors6.2(5.8–6.6)(c)Fixed-ratesmodel6.3(5.8–6.9)(d)Random-ratesmodel6.1(5.7–6.6)(e)Hominoidslowdown6.1(5.7–6.6) hseN.—NA,notapplicable;,,andparOTEG,calculatedusingtheparameterestimatesby ewnh12teedrrHienCPg55þþþþHthCss12121212e5HHPPPPHCCeHHHH(cid:2)þCCCCCþ2PPPP12ðasHHHH(cid:1)n12HC1CCCCchGGGGGe(cid:2)H(cid:2)s(cid:1)P(cid:1)PCstPh1HrHHaCHþHCCÞl(cid:2)C=CGGphG(cid:3)OO12HoOhPC(cid:1)hPpHhHHHiu(cid:2)CsHCCClaGCGþt(cid:1)hthGOOihe12oHOM(cid:3)HPnMChp;CGHHrGdo(cid:3)(cid:2)CCobG(cid:1)(cid:2)anO1hboHhi(cid:2)tlCHictGCyoPO(cid:3)atH(cid:3)lheCasGtc(cid:3)2ehiaHnlCletGðhl7easÞt HominoidAncestralPopulationSizes 1985 Table 3 Proportions ofSites with Transitional(S) andTransversional (V) Differences whenthe Human IsComparedwith Another Ape in theNeutral DataSet HumanLineage ExcessintheOtherApeLineageoverHuman SpeciesTriplet TotalSites S(SXX) V(VXX) j S(XSX–SXX) V(XVX–VXX) j HC–M 23,285,394 0.003933 0.001727 4.57 0.000068 0.000084 1.61 HG–M 23,264,472 0.004931 0.002164 4.58 0.000778 0.000429 3.63 HO–M 14,034,931 0.010072 0.004363 4.66 0.001287 0.001152 2.24 NOTE.—HiscomparedwithC,G,orO,withtheoutgroupMusedasreference.Forexample,HandCarecomparedinthefirstrow.Theproportionofsiteswithpattern SXX(wherehumanhasatransitionaldifferencewhileCandMareidentical)is0.003933.TheproportionofsiteswithpatternVXX(wherehumanhasatransversional difference)is0.001727.These2proportionsgivej54.57byequation(6).Similarly,transitionsandtransversionsarecountedfortheClineage.Theexcessproportionsof transitionsandtransversionsfortheClineageovertheHlineageareshowninthetable;thesegivethejestimate1.61.Rows2and3aresimilarcomparisonsforHGandHO. population (see eq. 1), and PHCG5e(cid:2)2ðsHCGO(cid:2)sHCGÞ=hHCG and tion of ancestral polymorphism to H–C sequence diver- PHCGO5e(cid:2)2ðsHCGOM(cid:2)sHCGOÞ=hHCGO are defined similarly. gence was unchanged, at 39%. With the parameter estimates from the Neutral data set under the basic model (table 2, row a), equation (7) Rate Variation among Loci gave 1dˆ 50:00410þ0:00181þ0:00074þ0:00001þ 2 HC 0:0000050:00666,implyingthat39%ofthedivergencebe- Wefirstimplementedthefixed-ratesmodel,usingma- tweenHandCisduetoancestralpolymorphism(contributed caque to estimate relative rates (eq. 2) and analyzing data by the last 4 terms in eq. 7). from4speciesonly(HCGO).Again,sequencingerrorsinC However,thepredicteddivergence1dˆ 50:00666dif- andGwereaccommodatedbye ande .Withtheremoval 2 HC C G fered from the average JC69 distance d /2 5 0.01270/ ofmacaque,e couldnotbeestimatedbecauseitwascon- HC O 2 5 0.00635 of table 1. Here we treat the JC69 distances foundedwiths .Instead,e wasfixedat0.00173,the HCGO O asobservationstoassessthefitofthemodeltodata.Similar estimateundertherandom-ratesmodel(seebelow).Thees- comparisons of predicted and observed H–G, H–O, and timatesofhandsforthe3ancestralpopulationsHC,HCG, H–M distances did not show such a discrepancy (results andHCGO(table2,rowc)weresimilartothoseobtained not shown). We note that only the H–C and HC–G diver- undertheassumptionofthesamemutationrateforallloci gencesaresufficientlycloseintimetogenerateasignificant (table 2, row b). proportion of gene tree–species tree mismatches, and the The fixed-rates model estimated relative rates using treemismatchinformationinthedataishighlyinformative macaque data by ignoring polymorphism in the ancestor abouttheparametersinequation(1)(h ,s ,ands ). HCGOM, which might be substantial if N were HC HC HCG HCGOM Under the basic model, it appeared that the excess branch large. We thus implemented the random-rates model, in lengthinGduetosequencingerrorscouldnotbereconciled which rates for loci were assigned the Dirichlet prior withthe treemismatchinformation inthe data, resulting in (eq. 4), with parameter a inversely related to the extent a poor fit to the observed HC distance. of rate variation among loci. We used a 5 25, so that Thefitimprovedconsiderablywhensequencingerrors the relative rates have SD 0.2 in the prior, slightly lower were accommodated in the model. As no errors were than the empirical SD (0.278) calculated by equation (2) assumed on the H lineage, the predicted average H–C for the fixed-rates model. The estimate for h HCGOM divergence was equal to the 5 terms of equation (7) plus (0.0053) was about half that obtained under the basic 1e , or 1dˆ 50:00385þ0:00165þ0:00077þ0:00002þ model. The inversely correlated parameter s in- 2 C 2 HC (cid:2) HCGOM 0:00000þ0:00020 250:00639 (using parameter estimates creased slightly from 0.0260 to 0.0292. Parameters for from table 2, row b). This was very close to the observed morerecentancestralpopulations(HC,HCG,HCGO)were value d /2 5 0.00635 (table 1). The estimated contribu- well estimated and changed little (table 2, row d). HC Table 4 Correlationbetween Parametersin thePosteriorDistribution h h h h s s s s e e HC HCG HCGO HCGOM HC HCG HCGO HCGOM C G h (cid:2)0.26 HCG h 0.01 (cid:2)0.04 HCGO h 0.00 (cid:2)0.03 (cid:2)0.12 HCGOM s 20.77 0.19 (cid:2)0.02 0.03 HC s 0.20 20.42 0.00 0.06 0.21 HCG s 0.01 0.12 20.67 0.10 0.18 0.31 HCGO s 0.00 0.05 0.13 20.90 0.03 0.05 0.01 HCGOM e 0.01 0.00 0.02 (cid:2)0.05 20.41 20.53 (cid:2)0.27 (cid:2)0.03 C e (cid:2)0.03 0.00 0.02 (cid:2)0.05 (cid:2)0.25 20.59 (cid:2)0.28 (cid:2)0.04 0.40 G e (cid:2)0.01 (cid:2)0.08 (cid:2)0.02 (cid:2)0.03 (cid:2)0.15 (cid:2)0.27 20.43 (cid:2)0.09 0.23 0.24 O NOTE.—TheNeutraldatasetwasanalyzedundertherandom-ratesmodel(Table2,rowd).Strongcorrelationsarehighlightedinbold. 1986 BurgessandYang G 71 WechangedaintheDirichletpriortoassessthesen- PS 0.20.3 sitivitytotheprior(supplementarytableS2,Supplementary 9) Materialonline). EstimatesofhHCGOM,andtoalesserde- 2. grees ,weresensitivetoa.However,parameteres- O 7– HCGOM e 2. timatesfortheotherpopulations(HC,HCG,HCGO)were ( 8 stable, showing little change with wide variation in the 2. prior. We concluded that this data set contained sufficient 9) informationonratevariationamonglocitoobtainreliable 3. G 8– estimates for all populations except the earliest,HCGOM. e (3. Accordingly,estimatesforh wereconsideredunreli- 9 HCGOM 3. able and were discarded from our final results; s HCGOM 1) estimates were retained but should be considered 1. approximate. – C 0 e 1. ( 1 1. Hominoid Slowdown 7)0) Thereisconsiderableevidenceforaratedifferencebe- 6.9. OM 5–27–2 tweenOWMsandthegreatapes(Goodman1961;Lietal. sHCG (26.(28. 1an99a6p;pYroipertiaatle.o2u0t0g2r;oKupimtoeptraolv.i2d0e0n6e).wOiunrfodramtaatsieotnlaacbkoeudt 68 6.8. such a rate difference; but if it exists, it might affect our 22 estimates.WeranMCMCCOALwithe 5 0.0082fixed. 7)4) M 5.4. Thisrepresentedaone-thirdhighermutationrateonthema- 11 O –– caque lineage than inthe apes. As would beexpected, the G 53 sHC (15.(14. posteriorestimateofsHCGOMbecamemuchsmaller(table2, 63 row e). However, our estimates for population sizes and a 5.4. at 11 speciationtimesinthe3morerecentancestralpopulations D ated CG 8–7.8)2–6.3) (fHerCen,cHeCinGm,HutCatGioOn)rwateerebertowbeuesnttOoWtheMpraonpdotsheedhlaormgeindoiifd- ur sH (7.(6. lineage. Unc 7.86.2 he 7)9) Robustness to Priors t 4.3. from sHC 6(4.6–8(3.8– timatTesheweafsfeacstsoesfshedanidnssuppripolresmoennptaorsytetraibolrepsaSra3maentderSe4s-, ed 4.3. Supplementary Material online. Again, posterior distribu- n ai 0) tionswerefoundtoberobusttodrasticchangesinthepri- metersObt hHCGOM 4.8(14.6–15.8.7(8.4–9.0) onoinrafgsri.rnobfIwoonrthmscuoaemtnxifiocmendaseisrnnycb,tehrwaeinendctaheftoravl.ueaWnnldsge,thrcrosoebflnauecnscldtutnidnereasgdtsettthhvoaeatralailaaclrtcgipooernmioamarmmsoodoauannntgd-t ra 1 locididnotoverspecifythemodelfortheNeutraldataset. s)forPa hHCGO (7.1–7.)(5.5–5.8) Locus Length and Within-Locus Recombination e 26 es 7.5. Withanaveragelocuslengthof508bp(Neutraldata h arent G –4.5)–3.9) 310 suentr)e,atlhiestiacssouvmerptthioentiomfensocarleecoofmhboimnaintiooindwevitohliuntiaonlo.cInusoirs- Is(inp hHC 4.4(4.43.8(3.8 scaledby ddseeagrtamtoseentatssoswfesaitshspthpeecroifiigmeredpsalsecinvtgeoltyhf frsrehocomormtaebrrailnnoadctioiombny,psowasemitigpoelninnwegriatohtenidne C e and95% hHC 7.0(6.8–7.2)6.2(6.0–6.4) eparametersar rmreeeaccacoothemmslbbowiicnnouaausttliidooinnnvatwwhryoeeruecNlodaenuhsstaierdvraieelorauadbsagltpyraeroawstbeeilttreh.mimWlo,pcetahucesptrlpeoeadnnrigacltmothenedgbteeetrrhcaaesutestsgiie-f- Table5PosteriorMeans BasicmodelSequencingerrors hsN.—,,andOTE mTrddTeaihhdeftfeunaues.tcr9s,Nee.5onde%Tut,vrhfrereepCrofltahmIrerseeacslmttbeuhisenleostcgtss,eaetrmtohhoefeeebsptttrwhiaoemiiidssndtaueeetacdrerniesoafdwrrdloyhimindmseifensnoatorhnatmhtesreeaaffpotusisprolhelehn-goalsmwercwnnoetgoenntirtthbneesendnittzaaogebtitranelvewasettehral6tyyes.. HominoidAncestralPopulationSizes 1987 Table 6 PosteriorMeansand95%CIs(inparentheses)ofParameterswhenShorterLociWereSampledfromtheNeutralDataSet LocusLength h h h s s s s P HC HCG HCGO HC HCG HCGO HCGOM SG 508 6.2(5.8–6.6) 3.3(3.2–3.4) 6.1(5.8–6.3) 3.8(3.7–3.9) 6.2(6.1–6.3) 13.7(13.6–13.8) 26.0(25.8–26.1) 0.31 333 6.7(6.1–7.4) 3.5(3.3–3.6) 5.9(5.6–6.2) 3.7(3.5–3.8) 6.0(6.0–6.1) 13.6(13.4–13.8) 25.5(25.3–25.7) 0.33 200 7.3(6.4–8.3) 3.7(3.5–3.8) 6.5(6.2–6.9) 3.5(3.3–3.7) 5.9(5.8–6.0) 13.2(13.0–13.4) 25.2(24.9–25.4) 0.35 100 7.8(6.1–9.6) 4.2(4.0–4.4) 7.1(6.6–7.6) 3.4(3.1–3.7) 5.6(5.5–5.8) 12.9(12.6–13.2) 24.5(24.2–24.9) 0.37 50 6.9(4.4–10.1) 4.6(4.2–5.0) 8.3(7.5–9.1) 3.4(2.9–3.9) 5.4(5.2–5.6) 12.2(11.7–12.6) 23.7(23.1–24.2) 0.38 NOTE.—Themodelofsequencingerrorsisused,withhandsparametersscaledby103andresultsforhHCGOMarenotshown.Forthesamples,errorratesarenot estimatedbutarefixedattheposteriormeansestimatedforthefull-lengthdataset(table2,secondrow).PSGisthetreemismatchprobabilityforH,C,andG. sensitivetotheeffectsofwithin-locusrecombination.How- a divergence time for calibration. We calibrated to the ever, ignoring model violations such as within-locus re- human–chimpanzee divergence time (T ), with 2 values HC combination may have caused the 95% CIs in our main used:4and6Myr.Fourmillionyearsistheminimumtime analyses to be too narrow. consistentwithwidelyacceptedfossilevidenceforH–Cdi- This analysis also highlighted the superiority of the vergence(Leakeyetal.1995),whereasmorecontroversial full likelihood method over the tree-mismatch method fossilevidence(Senutetal.2001;Brunetetal.2002)might (see Introduction). Our Bayesian analysis produced quite requireanearlierdivergenceatabout6Myr.Thegeneration reasonableestimatesevenwheneachlocuscontainedonly time was assumed to be g 5 15 years. Modern humans 50sites,withonaveragelessthanonepairwisedifference have longer generation times, but ancestral species were betweenH,C,andG.TheprobabilityP ofmismatchbe- probably physically smaller, perhaps similar to modern SG tweenthespeciestreeandthegenetree,calculatedfromthe macaques,whichhaveg(cid:1)11years(Gage1998).Thecali- parameter estimates, increased only slightly from 31% for brated results are shown in table 8. thefull-lengthdatato38%fortheshortestloci(table6and CalibrationtoT 5 4Myrimpliedamutationrateof HC fig.2).However,theprobabilityP ofmismatchbetween 0.98(cid:3)10(cid:2)9persiteperyearfortheNeutraldataset.This SE thespeciestreeandtheestimatedgenetree,whichisused was similar to the commonly used rate l 5 10(cid:2)9 per site bythetree-mismatch method,increasedto59%(fig. 2).It per year (e.g., Takahata et al. 1995; Rannala and Yang was noteworthy that even with the full-length data, P 2003). The population size for the HC ancestor (N ) SE HC (0.41) was considerably higher than P (0.31), so errors was estimated to be ;105, about 10 times larger than esti- SG in tree reconstruction have a considerable impact on the matesformodernhumans.ThisisconsistentwithTakahata’s simple tree-mismatch method. (1993) early estimate from analysis of MHC alleles and is muchlargerthantheestimatesofYang(2002)andRannala and Yang (2003) from the data of 53 loci of Chen and Li Analysis of the Complete and Inert TE Data Sets (2001). The latter data set may be atypical in having an From the above analysis, we consider the random- ratesmodelincorporatingsequencingerrorstobeourbest 0.6 model.ThiswasusedtoanalyzetheCompleteandInertTE data sets, with results shown in table 7. For Complete, es- timatesofhsandsswereallslightlysmallerthanthosefrom es P theNeutraldataset.Thiscanbeexplainedbythepresence ch 0.5 SE at ofasmallproportionofmaskedsitesunderpurifyingselec- m s tion.Theratiosofsestimatesbetweenthe2datasetssug- mi gested that the mutation rate of Complete was about 0.97 e e 0.4 times that of Neutral. Aside from this rate difference, the of tr P parameter estimates were almost identical between the 2 n SG o data sets. The estimates were evidently robust to any vio- orti lation of the neutral assumption in this large data set. p 0.3 o Estimates of hs and ss from the Inert TE data set are Pr comparablewiththosefromtheNeutralandCompletedata sets (table 7). However, the estimates suggest some varia- 0.2 tion in mutation rate among the 4 classes of TEs. We dis- 0 100 200 300 400 500 cuss the results below using a calibration to the H–C Mean locus length (bp) speciation time. FIG.2.—Errorsinphylogenetictreereconstructioninflatethespecies tree–genetreemismatchprobability.P istheprobabilityofmismatch SG betweenthespeciestreeandthegenetree,derivedfromthecoalescent Speciation Times and Ancestral Population Sizes parameter estimates by equation (1), whereas P is the probability of SE mismatchbetweenthespeciestreeandtheestimatedgenetree.Bothrefer TotranslatehsandssintopopulationsizeNsanddi- to the phylogenetic relationship among H, C, and G. Data sets with vergencetimeTs,itisnecessarytouseamutationratelor progressivelyshorterloci,usedintable6,areanalyzed. 1988 BurgessandYang PSG 0.290.310.32 nsfor ucensusaunadllayssmmaallllgveanreiatnrecee–ascpreocsiseslotcreieinmsiesqmuaetncchepdroivpeorrgtieonn- a me (P 5 0.203)(Yang2002;Sattaetal.2004).Estimatesfor SE 6)9)9) or earlier ancestral populations N (55,000) and N eO (1.4–1.1.61.(–(1.5–1. eposteri (ti8m5e,0s0w0)erweeersetismoamteedwhtoatbsem6a.l4leHMrC.yGTrhfeorspgeocriiellsa,di1v4e.r6gHeMCnGcyOer 1.51.71.7 atth for orangutan, and ;25–30 Myr for macaque. d e WhencalibratedtoT ,estimatesfromtheComplete fix HC –1.1)1.2)––1.3) were dtiaotnaosfetthweesraemveercyalsiibmraitliaorntotoththeeN4ecultarsasledsaotafTseEt.sAsupgpgleicsats- eG 1.01.01.1 but that they have different mutation rates (table 8). LINE ((( d 1.11.11.2 mate elements have the lowest mutation rate, with the rate esti for DNA transposons 6% higher, LTR transposons 11% eC (0.1–0.2)0.10.3)(–(0.2–0.4) eswerenot himsigaathceecsro,ouafnnNtdesdSafInoNdrE,Tsths1e,7w4%hTihcEhigchaleraers.sseWosmhgeeawnvhethavitsehrriyagthseiemdriitflhfaearrneensthctiee- 0.10.20.3 bclass aenstdimleastsefsorfrTosm).OthneeNreeaustoranlcdoautladsbeett(hbeysmabaoluletr2m5%eanfolorcNuss u sHCGOM 28.2(28.1–28.4)29.229.029.5)(–30.9(30.6–31.2)29.6(29.0–30.0)33.8(33.3–34.2)32.9(31.8–33.3)30.9(29.9–31.6)25.5(24.3–26.6) ee,,andforTEsCGOforH,C,andG. lwimfemoniurtpghttaahticthCneito-aN(lnol3oiesf2cbuu1rlptoasretvcairrusnle.sgdsc5iaoltt0eteoman8TbpsgebHietnhtprCaw)).ty,5iaeossaon6or,tniMhsliaymryte2prdt/hli3uynecisanemtsegdlaoad(tdrsh,geeatelehti,neatagatmlbl0elu.rpe6rtoao56ptri(cid:3)uofdolnua1rter0itao(cid:2)httnoee9 ey DifferentDataSets sCGHCGO 1–6.2)13.9(13.9–14.0)26.4)14.314.214.5)–(–14.4(14.2–14.6)1–6.3)7–5.8)13.1(12.9–13.3)6–6.8)16.3(16.0–16.6)1–6.4)14.3(14.0–14.7)7–6.0)13.2(12.7–13.6)1–5.6)11.7(11.1–12.3) harenotshown.ParametersHCGOMisthetreemismatchprobabilitPSG sttfMs2ihhnoie0zeerey0hemrhm1soco;TaimafanlmonhBcigibrdneailrnrqyouadcmouintiadiieveadilonietcanbrsrcaneergloqcdaTotemuetuwHniasecopCcltde.oneaod(5rt2TwtsiTi0bt.HmHhn0l4ECeeC2emGbMs)e5OeofwsrmoMydtsiser4be)am.srleyieTMaslrwget1bheieye.eeet5mrhrvtoddetiemaidtiumd3veoaea7eneynltr.csteogMoeb(-aoe2tesyhnl0i(ricolnp0Sr.aelder7redeag)nt,hfgieeudmieagrtaialsehstsbcheeouuolroternsftur,sidang4aeaetlgh5desr. CIs(inparentheses)forParametersObtainedfrom hhssHCGHCGOHCH 3.2(3.1–3.2)4.8(4.7–5.0)3.8(3.8–3.9)6.1(6.3.33.23.4)4.94.75.2)3.93.84.0)6.36.(–(–(–(3.7(3.6–3.8)6.1(5.8–6.4)3.7(3.6–3.8)6.2(6.3.3(3.2–3.5)5.6(5.2–6.0)3.4(3.2–3.5)5.7(5.4.5(4.3–4.7)7.1(6.6–7.7)4.0(3.8–4.2)6.7(6.3.7(3.5–4.0)5.9(5.3–6.6)3.8(3.5–4.0)6.3(6.3.6(3.3–4.0)6.4(5.5–7.4)3.6(3.3–3.9)5.9(5.2.0(1.6,2.5)3.2(1.9,4.4)3.1(2.8–3.9)5.4(5. 3hseerrorsisused,with,,andparametersscaledby10andresultsforwerefixedattheposteriormeansforthedataset.NeutralXNeutral tplcbtitmataTptahohcahhonnnhyhhategeaeedddciiiulinosieleifncihsmtnsinosntcratn,buhtgaareoloteesmmieaiirarapnarntiengrs.pgareneevecdlenrrucddtsysooehipiplhslrsfinrioefutoeoraneeohgcrectrndngesriHstiodcatouisfiptilniCohgniulmoicolnagniegwnmrtgptamohecaodsbeanpmorosiorocmdynmtamrdthmtartnghoraieioefmnezonutrordesgepgtnfscotsrepseii,hhsanmoroewidohoinllncntaeeilufaiisnoshdsdemntweegphouoesancvaiaieesftmfeectouagpnreustaile.Hnaehrsntutlteno,treT–httioaafiserilpCemhog,dlrrxoaepenerpeiwfrvlerngs.adryroeaegpeoionhSmsttunisemmctaheoiotetcnoethumnnibebisrpooaatscpeyteehwontlrrehilaeaselogymfeicyovliorsnerrnhmfeofoomen.poncmrmnaeooastspGopetetrfplohimhpyofdeceecraoeohnmmpicmolclmtutielihouhthseooeogteogumleailrrsdnhdcer-----t. and95% hHC (5.4–5.9)5.76.6)(–(6.3–7.3)(5.9–7.3)(6.6–8.5)(6.1–8.6)(4.6–7.7)(2.0–3.3) ofsequencingset.Thosefor XSpCechiWarotiemonfiorssotmceonDsiivdeerrgedentcheearnadtioHoufmmane–aCnhgimenpeatinczedeiver- Means 5.66.16.86.67.57.36.12.6 hemodeldataTE lgaetnedcedbieretwctelyenfXromandJCth6e9audtiostsaonmceese(sAti)m, dat(eXs)/di(nA),spceaclcieus- Table7Posterior CompleteNeutralInertTELINESINELTRDNANeutralX N.—TOTEtheentireInert c00coe..78nm04tp–mfa0oro.ir8ds5oeHlnb–wseCtiw(,tthae0boe.8lnne0ea–1an0ny.l8caae2spsttebrraeaoltnwwpdoe).empnuTalchaaatneiqyoudneo(X.stih)Uz/edenr(,Adt2e)hrearaapdte(icXoso,)a/wdale(naAsds-)

Description:
Estimation of population parameters for the common ancestors of humans that ancestral populations were 5-10 times larger than modern humans
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.