Copyright©2005bytheGeneticsSocietyofAmerica DOI:10.1534/genetics.104.032276 The Structured Ancestral Selection Graph and the Many-Demes Limit Paul F. Slade*,1 and John Wakeley† *School of Mathematics and Statistics, School of Biological Sciences, SUBIT, University of Sydney, New South Wales 2006, Australia and †DepartmentofOrganismicandEvolutionaryBiology,HarvardUniversity,Cambridge,Massachusetts02138 ManuscriptreceivedJune21,2004 AcceptedforpublicationOctober28,2004 ABSTRACT Weshowthattheunstructuredancestralselectiongraphappliestopartofthehistoryofasamplefrom apopulationstructuredbyrestrictedmigrationamongsubpopulations,ordemes.Theresultholdsinthe limitasthenumberofdemestendstoinfinitywithproportionatelyweakselection,andwehavealsomade the assumptions of island-type migration and that demes are equivalent in size. After an instantaneous sample-size adjustment, this structured ancestral selection graph converges to an unstructured ancestral selectiongraphwithamutationparameterthatdependsinverselyonthemigrationrate.Incontrast,the selectionparameterforthepopulationisindependentofthemigrationrateandisidenticaltotheselection parameterinanunstructuredpopulation.Weshowanalyticallythatestimatorsofthemigrationrate,based onpairwisesequencedifferences,derivedundertheassumptionofneutralityshouldperformequallywell in the presence of weak selection. We also modify an algorithm for simulating genealogies conditional onthefrequenciesoftwoselectedallelesinasample.Thispermitsefficientsimulationofstrongerselection than was previously possible. Using this new algorithm, we simulate gene genealogies under the many- demes ancestral selection graph and identify some situations in which migration has a strong effect on thetimetothemostrecentcommonancestorofthesample.Wefindthatasimilareffectalsoincreases thesensitivityofthegenealogytoselection. KIMURA(1983)stronglypromotedtheideathatthe Hudson (1983), and Tajima (1983). The reason for abundantgeneticvariationseeninnearlyeveryspe- thisisclear:itismuchsimplertopredictthefrequencies ciesstudiedmustbeneutral.OhtaandKimura(1971) ofselectedallelesusingdiffusiontheorythanusingthe suggestedaless strictversionofthis inwhichpolymor- backward-timeapproachofthecoalescent.Forneutral phisms are explained by the constant input of nearly loci,thecoalescentprovidesasimpleandusefuldescrip- neutralmutations,i.e.,variantswithselectiveadvantages tionofthegeneticancestryofasampleofgeneticdata— ordisadvantagessmallerthanthereciprocalofthepop- thegenealogyforshort—fromalargewell-mixed,orpan- ulationsize.Afteragreatdealofdebateandmanyanaly- mictic,populationofconstantsizethroughtime.Dueto ses,summarizedinGolding(1994),therearenowfew the close connection between genealogies and genetic adherentstothestrictneutralmutationhypothesis.Mo- data,andtotheeasewithwhichgenealogiescanbesim- lecular techniques currently allow huge numbers of ulated,agrowingsetofcomputationaltoolsusethecoales- polymorphismstobeassayedwithrelativeease,andthe cent to make inferences about population history from resulting genomic data can be used to estimate the DNAsequences;seeStephens(2001)andTavare´(2004) strengthofselectionassociatedwithgeneticdifferences for reviews. (Sawyer and Hartl 1992; Bustamante et al. 2002). The coalescentis robust to manykinds of deviations Recent statistical inferences made from such data em- fromitsunderlyingassumptions(Kingman1982b;Mo¨hle phasizetheimportanceofselectionandshowthatselec- 1998c),butitdoesnotholdwhenmembersofthesam- tive advantages or disadvantages on the order of the ple, or the ancestral lineages of the sample, are not reciprocalofthepopulationsizemaybefairlycommon exchangeable.Infact,therelativesimplicityoftheneu- (Sawyer et al. 2003). tralcoalescentprocessflowsdirectlyfromtheexchange- Themethodsusedintheseworkstoestimateselection ability of the lineages. Exchangeability, which results parametersfromintraspecificgenomicdatarelyonpre- in the model from the assumptions of neutrality and dictions of sample allele frequencies derived from for- panmixia,meansthatthestatisticalpropertiesofasam- ward-time diffusiontheory. They do notuse the ances- pledonotdependonhowthesampleditemsarelabeled tralgeneticprocessknownasthecoalescent,whichwas (Kingman 1982b; Aldous 1985). Possible “labels” in- introduced in the early 1980s by Kingman (1982a,b), cludethegeographiclocationswherethesampleswere collectedortheallelicstatesofthesamplesiftheseare known. Whensamples arenot exchangeable,a related 1Correspondingauthor:SchoolofMathematicalSciences,University ofAdelaide,SA5005,Australia. E-mail:[email protected] but more complicated model called the stuctured coa- Genetics169:1117–1131(February2005) 1118 P.F.SladeandJ.Wakeley lescent exists (Slatkin 1987; Strobeck 1987; Noto- able to identify cases in which simple models may be hara 1990; Wilkinson-Herbots1998), and computa- applied even though the underlying dynamics appear tional methods of inference are being developed for complicated. We present a rigorous derivation of the this model as well (Nath and Griffiths 1996; Beerli ASG with very many demes (subpopulations) in an is- and Felsenstein 1999, 2001; Bahlo and Griffiths landmodelofmigration(Wright1931)andshowthat 2000; De Iorio and Griffiths 2004). itpossessesasimplestructureakintothatunderneutral- Genealogies in the presence of strong selection, i.e., ity (Wakeley 1998). Five parameters collapse into just withselectiveadvantagesordisadvantagesmuchgreater three(oneeachformigration,selection,andmutation) thanthereciprocalofthepopulationsize,canbemod- in the limit as the number of demes tends to infinity. eled using the structured coalescent with the subdivi- Interestingly,thepopulationratesofselectionandmu- sions being the alternative alleles (Kaplan et al. 1988, tation scale differently (see below) in the presence of 1989). For weak selection, Krone and Neuhauser population subdivision. This was found previously in (1997)andNeuhauserandKrone(1997)showedthat forward-time analyses (Wakeley 2003; Wakeley and thegenealogyofasampletakenatrandomwithrespect Takahashi2004),buttheresultismoreintuitiveinthe to allelic variation is described by an exchangeable an- presentcase.Wealsodescribeanewlyenhancedprocess cestral process. The ancestral selection graph (ASG) is for the simulation of genealogies conditional on the abranching-coalescingrandomgraphwithinwhichthe frequenciesofallelesinthesample.Weshowusingcon- simple,bifurcatinggenealogyofasampleisembedded. ditionalsimulationsthatpopulationsubdivisionandmi- TheelegantexchangeabilityoftheASGcomeswithacost: grationamongverymanydemesaccentuatetheaction itlendsitselfonlytoverycomputationallyintensivetech- of selection on the genealogy. niques. In addition, there appears to be very little about genealogiesintheASGthatdiffersgreatlyfromgenealo- THEORY gies under neutrality (Krone and Neuhauser 1997). For these reasons, and also to be able to control for Thestructuredancestralselectiongraphistheances- sampling, recent extensions to the ASG have modeled tralprocessforaclassofforward-timemodels.Following genetic ancestry conditional on the frequencies of se- Kroneand Neuhauser(1997)webegin bydescribing lected alleles in the sample (Slade 2000a,b; Stephens acontinuous-timemodelinwhichreproductionoccurs and Donnelly 2003; Barton et al. 2004). This leads accordingtotheMoran(1958,1962)model,butwhere to dramatic improvements in computation time while thepopulationissubdividedintodemes(subpopulations) preserving some degree of exchangeability among the connected by migration. There are D demes, and in a sampledlineages(Slade2000a).Theconditionalsimu- general model these might have sizes N, i (cid:1) 1, ..., D. i lation of selected genealogies by Slade (2000a,b) en- Weassumethateachofthe(cid:1)Di(cid:1)1Ni lineagesinthepopu- ables the timing structure of common ancestry to be lation can be in one of two possible allelic states, A 1 quantified.IncontrasttotheunconditionalASG,gene- and A , with relative rates of reproduction (cid:2) and (cid:2), 2 1 2 alogiesconditionaluponallelefrequenciesinthesam- respectively. However, as in Krone and Neuhauser plemaybeeitherlargerorsmalleronaveragethangene- (1997),webeginwithadescriptioninwhichthestatesof alogiesunderneutrality(Slade2000a);seealsothereview lineagesarenotspecifiedandsoeachlineageundergoes of Slade and Neuhauser (2003). The extent of these two kinds of reproduction (birth/death) events. Type differencesdependsstronglyontheinitialsamplecon- 2birth/deatheventsoccurwithrate(cid:2) (cid:3)(cid:2) perlineage. 2 1 figuration of alleles, particularly for small mutation These represent the action of selection since they are rates. Under neutrality, samples in which there are ap- realized only if the lineage that reproduces is of the proximately equal numbers of each allele have longer favored allelic type at the time of the event. Type (cid:4) timestocommonancestrythanmoreunbalancedsam- birth/death events occur with rate (cid:2) per lineage. The 1 ples.However,theselectiveeffectongenealogiesforsam- majority of birth/death events will be of this type, and plesin whichthere areapproximately equalnumbers of theycanbethoughtofasneutralinthesensethatthey each allele is greater than that of more unbalanced arerealizedregardlessoftheallelictypeoftheparental samples. We show that restricted migration can cause lineage.Note thatat thispoint, withoutspecifiying the unbalanced samples to be converted to balanced sam- allelic types, lineages are exchangeable within demes ples,viatheinstantaneoussample-sizeadjustmentmen- but are not exchangeable between demes. Note also tioned above, and thus the time to common ancestry that, while this is a model of directional selection, it is under selection is reduced. possibletoincludefrequencydependence(Neuhauser Our purpose is to consider the ASG in the presence 1999) or general diploid selection (Slade 2000a). of population subdivision and migration. Population Subdivision is mediated by a collection of D (cid:5) D structureisevidentinmanygeneticdatasetsbutmodel- migration probabilities, m , for i, j (cid:1) 1, ..., D. When ij ing it presents difficulties (Slatkin 1985; Avise 2000; a lineage in deme j reproduces, either by a type (cid:4) or Hey and Machado 2003), creating impediments both by a type 2 event, the individual its offspring will re- toinferenceandtounderstanding.Therefore,itisvalu- place (i.e., the individual picked to die) is chosen uni- Many-DemesAncestralSelectionGraph 1119 formlyatrandomfromamongtheN lineagesindeme Neuhauser (1997) refer to as the percolation diagram. i i with probability m . With probability 1 (cid:3) m, where This is shown in Figure 1 and is the structured analog ij m (cid:1) (cid:1)Di(cid:1)1mij, the individual to die is chosen uniformly ofFigure1inKroneandNeuhauser(1997).Following atrandomfromwithinthesamedemeasthereproduc- a single lineage, either forward or backward in time, ingindividual.AsusualinMoran-typemodels,theindi- 2-arrows are encountered at rate (cid:2)s and (cid:4)-arrows are 1 D vidualchosentoreproducemightalsobetheonecho- encountered at rate (cid:2). Each arrow has a probability m 1 sentodie.Althoughthemodeldoesallow“migration” ofconnectingtoalineagesampleduniformlyatrandom backtothesamedeme(withprobabilitym )theeffectof from the entire population and probability 1 (cid:3) m of ii thisbecomesnegligibleinthelarge-Dlimitweconsider connecting to a lineage sampled uniformly at random below. from the same deme. Thus, there are four kinds of The general model described above forms the basis branchesandtheseappearrepeatedlyinthehistoryof forastructuredancestralselectiongraph.Thiswouldbe every lineage with rates (cid:2)(1 (cid:3) m), (cid:2)m, (cid:2)s (1 (cid:3) m), 1 1 1 D obtainedviatheusuallarge-Nlimit,withDfinite,inthe and (cid:2)s m. Note that (cid:2) is simply a scaling factor that 1 D 1 same way that the structured coalescent is obtained in specifiestheunitsinwhichtimewillbemeasured;(cid:2) (cid:1) 1 theneutralcase(Notohara1990;Wilkinson-Herbots 1 corresponds to the usual notion of measuring time 1998).Thatis,wewouldletN (cid:1) (cid:1)Di(cid:1)1Ni andassumethat in generations. c (cid:1)N/N,M (cid:1)Nm ,(cid:6)(cid:1)(cid:2) (cid:3)(cid:2),and(cid:7)(cid:1)Nu,where Thedualorancestralprocessisobtainedbyfollowing i i ij i ij 2 1 u is the neutral mutation rate, are all finite as N → ∞, lineages back in time, i.e., up the graph in Figure 1. and time is measured in units of N generations, where Because2-arrowsarerealizedonlyiftheparentalallele a generation is defined to be (cid:2)(cid:3)1 time units. Note that is of the favored type, both lineages must be followed 1 acrucialassumptionofthestructuredcoalescentisthat intheancestralprocess.Asinglegenealogyisobtained, migrationoccursonlybetweentheDdemesenumerated in the unconditional ancestral process, by tracing lin- intheinitialsample,andtheancestryisthereforealways eages back to the ultimate ancestor, assigning its type restricted to those particular demes. from the equilibrium distribution of allele frequencies Wedonotpursuethislarge-Nlimithere,butinstead (see below), and then following lineages forward in consideralimitingancestralprocessthatapproximates time,withmutation,andtrimmingoffbranchesappro- the behavior of a population divided into very many priately(KroneandNeuhauser1997;Neuhauserand demes and in which selective differences are small. Krone1997).Ratherthanthisunconditionalancestral Thus,weassumethatthesamplesizeissmallrelativeto process,weconsiderthegenealogyofasamplecontain- thenumberofdemesinthepopulation,ratherthanto ing known numbers of different alleles. This leads to thedemesize.Historically,migrationcantakeancestors moreefficientsimulationofselectedgenealogies.Using toanydemewithinthepopulation.Thisfollowsrecent theconditionalapproach,itispossibletolabelancestral coalescent work under the assumption of neutrality lineages as either virtual or real, and it is necessary to (Wakeley 1998) and work on the forward-time diffu- trace lineagesback only tothe most recentcommon an- sion approximation for allele frequencies in the pres- cestorofthesample,i.e.,amongthereallineages,rather enceofselection(Wakeley2003;WakeleyandTaka- than all the way back to the ultimate ancestor (Slade hashi 2004). We set (cid:2) (cid:1) (1 (cid:8) s )(cid:2), and we use (cid:2)s 2000a). There is also a minimal representation of the 2 D 1 1 D in place of (cid:2) (cid:3) (cid:2) for the rate of type 2 birth/death ancestral process that reduces the number of possible 2 1 events below. This corresponds to a model in which eventsrequiredateachancestraltransition(Sladeand therearetwoalleles,A andA ,andalleleA hasfitness Neuhauser2003).Wedescribehowthisisdoneinthe 1 2 2 equal to 1 (cid:8) s while A has fitness equal to 1. We put present, structured model in methodology. D 1 a subscript on the selection coefficient in recognition Theessenceofthelarge-Dresulthere,asintheneu- ofthefactthatthelimitweseekhasDs(andDu)finite tral case, is that the rates at which events occur that asD→∞.FollowingLessardandWakeley(2004),the affect the history of a sample from the population are large-N, large-D limit can also be studied. Similar limit verydifferentwheneverylineageisinaseparatedeme resultshererequiremorestringentconditions:roughly, than when at least one deme contains more than one that D goes to infinity faster than N (see appendix a). lineage.ThedependenceonDissuchthataseparation- For simplicity, we assume that migration occurs ac- of-timescales result applies, as, for example, in Mo¨hle cordingtoWright’s(1931)islandmodel,butadapted (1998a,b). Because of this, the history of a sample can forMoran-typereproductionasinWakeleyandTaka- be divided into two phases (Wakeley 1999). The first hashi(2004).Thatis,weassumethatm (cid:1)m/Dforall is a short scattering phase in which coalescent events ij iandj,andthatN (cid:1)Nforalli.Thus,wehaveamodel occurbetweensamplesfromthesamedemeandmigra- i in which there are D demes, each of size N, and each tioneventsinwhichlineagesmovetounoccupieddemes of which receives a migrant from the total population (i.e.,demesthatdonotcontainanyancestrallineages). withprobability mat eachbirth/death event.As inthe Thescatteringphaseoccursonafasttimescaleandbe- unstructured ancestral selection graph, it is helpful to comes an instantaneous sample size adjustment in the consider a graphical representation, which Krone and limit as D tends to infinity. The scattering phase ends 1120 P.F.SladeandJ.Wakeley Figure1.—Graphicaldepictionofthestruc- tured selection process for the case of D (cid:1) 3 demes. when allremaining lineages are indifferent demes. At usedtoshow thatthecollectionofdemes isalwayssuf- this point, the collecting phase begins in which pairs of ficiently close to (cid:9) that changes in X depend only on lineagescometogetherintoasingledemeandcoalesce, the ensemble properties E(cid:9)[K] (cid:1) (cid:1)Nk(cid:1)0k(cid:9)k (cid:1) Nx and eventuallyfindingacommonancestoroftheentiresam- Var(cid:9)[K](cid:1)x(1(cid:3)x)N2/(Nm (cid:8)1(cid:3)m)andthatthedif- ple.Themigrationeventsthatcouldbeignoredinthe fusion of X isidentical to the usual unstructured diffu- scatteringphase,i.e.,thoseinwhichlineagesmovetooc- sionexceptthatthetimescaleisincreasedbythefactor cupieddemes,arenowessentialsinceacoalescentevent 1 (cid:8) (1 (cid:3) m)/(Nm) (Wakeley and Takahashi 2004). canoccuronlybetweenapairoflineagesiftheyarein By considering overall limiting rates of type (cid:4) and thesamedeme.Weusethescattering/collectingtermi- type2birth/deathevents—whichare(cid:2)NDand(cid:2)s ND 1 1 D nology even though we have incorporated selection. (1(cid:3)x),respectively—andconditioningonthenumber Forward-timeanalysis:Tosimulategenealogiesinboth of copies of allele A in the deme where the reproduc- 1 theconditionalandtheunconditionalASG,itisneces- tion event occurs, it can be shown that sarytoknowtheequilibriumdistributionofthefrequen- ⎧ 1 Nm ciesof A1and A2in thepopulation.In fact,the ASGis ⎪X(cid:8)ND withrate(cid:2)1NDx(1(cid:3)x)Nm(cid:8)1(cid:3)m ariummodpeolpsupleactiifionca.lTlyhfeorreawsiallmbpelenofreoqmuisluibcrhiuamnweqituhioliubt- ⎪⎪ (cid:8)(cid:2)ND(1(cid:3)x)u (cid:2)1(cid:8)2x Nm (cid:3) ⎪ 1 D Nm(cid:8)1(cid:3)m mutation, and here we follow Krone and Neuhauser ⎪ (cid:8)O(s u ) (1997) in assuming that there is a probability of muta- X→⎨ 1 D D Nm ⎪X(cid:3) withrate(cid:2)NDx(1(cid:3)x)(1(cid:8)s ) tionuDperbirth/deathevent.Wenotethatasymmetric ⎪ ND 1 D Nm(cid:8)1(cid:3)m mutation can also be accommodated (Slade 2000a). ⎪ (cid:2) Nm (cid:3) The forward-time dynamics of allele frequencies in a ⎪ (cid:8)(cid:2)1NDxuD1(cid:8)2(1(cid:3)x)Nm(cid:8)1(cid:3)m ⎪ subdivided population may be complicated, but in the ⎩ (cid:8)O(sDuD). (2) caseofalargenumberofdemestheyarenearlyassimple We let as in an unstructured population (Wakeley 2003). In amodelverysimilartotheoneweconsiderhere,Wake- ND(cid:4) 1 (cid:3) m(cid:5) (cid:2) (cid:1) 1 (cid:8) (3) leyandTakahashi(2004)showedthatthefrequencies 1 2 Nm of alleles in the total population change according to adiffusionprocessthatisidenticaltothediffusionpro- and we further assume that cessinanunstructuredpopulation,onlywithadifferent NDs → (cid:6) and NDu [1 (cid:8) (1 (cid:3) m)/(Nm)] → (cid:7) timescale.Further,asthesefrequencieschange(slowly) D D the collection of demes closely tracks an equilibrium as D → ∞. (4) distribution of allele frequencies. Then, the diffusion process for the frequency of allele Let the random variable X(t) be the frequency of A has drift parameter a(x) (cid:1) (cid:3)((cid:6)/2)x(1 (cid:3) x) (cid:8) alleleA1attimet,andletxrepresentaparticularvalue ((cid:7)1/2)(1 (cid:3) 2x) and diffusion parameter b(x) (cid:1) x(1 (cid:3) ofX(t).BecauseweconsiderthelimitD→∞whileDs D x). This process has a unique stationary distribution and Du remain finite, mutation and selection do not D appear in the equilibrium h(x) (cid:1) Bx(cid:7)(cid:3)1(1 (cid:3) x)(cid:7)(cid:3)1e(cid:3)(cid:6)x, (5) (cid:6) (cid:4)N(cid:5)(cid:4) Nmx (cid:5) (cid:4)Nm(1 (cid:3) x)(cid:5) (cid:4) Nm (cid:5) whereBisanormalizingconstant;thatis,h(x)satisfies (cid:9) (cid:1) (1) k k 1 (cid:3) m (k) 1 (cid:3) m (N(cid:3)k) 1 (cid:3) m (N) 0 (cid:1) 1 d2 [b(x)h(x)] (cid:3) d [a(x)h(x)] (6) 2dx2 dx (Wakeley and Takahashi 2004), which here is the fractionofdemesinthepopulationthatcontainkcopies (Wright 1949; Kimura 1955). With reference to Fig- ofalleleA andinwhicha (cid:1)a(a(cid:8)1)···(a(cid:8)k(cid:3)1) ure1,thetwo-partequilibrium(1)and(5),whichpre- 1 (k) isanascendingfactorial.Clearly,theequilibriumvector dicts the distribution of A among demes and across 1 (cid:9)doesdependonthefrequencyofalleleA inthetotal the total population, is what would be observed by as- 1 population, and this will change over time. However, signingA ’s andA ’sto thelineagesand thenrunning 1 2 Theorem 3.3 of Ethier and Nagylaki (1980) can be the process forward a very long time. Many-DemesAncestralSelectionGraph 1121 TABLE1 Theninepossibleresultsoftype(cid:1)andtype2birth/deatheventsinthehistoryofnlineagescurrentlyinndifferentdemes Rate Event Outcome Magnitude U (cid:1)(cid:2)n(1(cid:3)m) (cid:4),nomigration Nochange O(1) 1 1 D(cid:3)(n(cid:3)1) U (cid:1)(cid:2)nm (cid:4),migrationtou-deme Nochange 2 1 D n(cid:3)1 1 U (cid:1)(cid:2)nm (cid:4),migrationtoo-deme,coalescent Coalescentevent O(1/D) 3 1 D N n(cid:3)1N(cid:3)1 U (cid:1)(cid:2)nm (cid:4),migrationtoo-deme,nocoalescence Possiblecoalescent 4 1 D N 1 U (cid:1)(cid:2)s n(1(cid:3)m) 2,nomigration,coalescent Nochange(self-collision) 5 1 D N N(cid:3)1 U (cid:1)(cid:2)s n(1(cid:3)m) 2,nomigration,nocoalescence Possiblebranch 6 1 D N D(cid:3)n U (cid:1)(cid:2)s nm 2,migrationtou-deme Branchevent 7 1 D D n 1 U (cid:1)(cid:2)s nm 2,migrationtoo-deme,coalescent Collision O(1/D2) 8 1 D DN nN(cid:3)1 2,migrationtoo-deme,nocoalescence Possiblebranch U (cid:1)(cid:2)s nm 9 1 D D N U-demeando-dememeanunoccuppieddemeandoccupieddeme,respectively. The collecting phase: Consider the case in which n coalescenceaswefollowthehistoryofthesampleback lineagesarein,oraresampledfrom,ndifferentdemes. in time. By tracing the lineages back in time without knowing It is appropriate that we have ignored mutations theirallelictypes,wefindasimpleancestralprocessin aboveandinTable1becauseherewearedealingwith thelimitD→∞andgenerateanintuitiveunderstanding the unconditional ancestral process. Mutations will oc- forthedifferentscalingofmutationvs.selectionin(4) cur with probability u , 0 (cid:10) u (cid:10) 1, at both types of D D andinWakeley(2003)andWakeleyandTakahashi arrows or reproduction events. Along a single lineage, (2004).Thefourkindsofarrows,orreproductionevents, theyoccurwithrate(cid:2)u (cid:1)ND(1(cid:8)s )u [1(cid:8)(1(cid:3)m)/ 2 D D D discussedabovewillbeencounteredbythesenlineages (Nm)]/2,whichbecomes(cid:7)/2inthelimitD→∞.Asin and will sometimes move them to occupied demes so Krone and Neuhauser (1997) and Neuhauser and thattheymightcoalesce.Wecanrecognizeninepossible Krone (1997), we superimpose this mutation process eventsforsuchasample,andthesearelistedinTable1. ontheunconditionalancestralprocess.Intheconditional For example,the fourthkind of eventis that a(cid:4)-arrow ancestralprocessweconsiderbelow,itisnecessarytotreat takes one of the n lineages to one of the other n (cid:3) 1 mutation, coalescence, and branching simultaneously. occupied demes but does not connect to the resident When D is large, the first two events in Table 1 will ancestrallineage.Theresultisthatnowthenlineages, account for the vast majority of events. These events while still distinct, reside in n (cid:3) 1 demes. Thus, at the have rates of O(1), which here means that they have a nexteventthetwolineagesthatresidetogetherinone finite and nonzero limit as D → ∞ for a given (cid:2). It is 1 deme can coalesce. The interpretations of the other easy to see that this is true because the rates of these eventsinTable1areequallystraightforward.Notethat events depend on D only through (cid:2). These events are 1 the events are categorized according to their effect on lineage switches within and between demes, but ones theancestral lineagesand alsoby theirprobabilities of that preserve the basic sample structure of n lineages occurring with reference to the limit D → ∞ (while in n different demes. They do not change the rates at Ds remains finite). As in the unstructured ancestral which events occur and Table 1 continues to apply. D selection graph, when a 2-arrow is encountered, both Note that with other kinds of (non-island) population paths are followed and the lineage splits into two lin- structure, these events would change the state of the eages. We are interested in the rates of branching and sample by moving lineages among different types of 1122 P.F.SladeandJ.Wakeley demes (Wakeley and Aliacar 2001). Here, they can where(cid:4)n(cid:5) (cid:1) n(n (cid:3) 1)/2and(cid:6)(cid:1)NDs .Again,thefirst 2 D be ignored due to the symmetry of the island model. twotypesofeventsdonotchangethestateofthesample, Thenextmostnumerouseventswillbeevents3–7,and soitisnotproblematicthattheirratesbecomeinfinite. these have rates of O(1/D), which means that these Itsimplymeansthattheancestrallineageswillencoun- rates will approach zero as D → ∞ for a given (cid:2)1, but ter many birth/death events before anything of rele- that their rate of approach to zero will be inversely vance happens to the sample. Event 3 is a coalescent proportional to D. Again, the limiting process we seek event in which the number of lineages decreases from is for DsD finite as D → ∞, so that sD is of O(1/D). Four nton(cid:3)1andthesen(cid:3)1lineagesareallindifferent oftheseevents—3,4,6,and7—dofundamentallyalter demes. Event7 is a branchingevent, in whichcase the the state of the sample. These are migration events to samplegoesfromnton(cid:8)1lineages,andamigration occupied demes, both with and without coalescence, event leaves all n (cid:8) 1 lineages in different demes. In and branching events, both with and without a migra- event4thenumberoflineagesremainsn,butnowtwo tion event to an unoccupied deme. of them are in the same deme, while in event 6 the KroneandNeuhauser(1997)andNeuhauserand number of lineages increases to n (cid:8) 1, and again two Krone(1997)usethetermcollisiontodenotetheevent areinthesamedeme.Tosummarize,thefirststepthat that an ancestral lineage splits and immediately co- matters in the limiting (D → ∞) process for n lineages alesces with another ancestral lineage. In the unstruc- in n different demes can be a coalescent event or a tured ancestral selection graph, all collisions become branching event, but in either case the remaining lin- negligible in the limit (N → ∞ and D (cid:1) 1), and it is eages are all in different demes; alternatively it can be not necessaryto distinguishbetween differentkinds of a migration event or a branching event that results in collisions.Here,becauseNisfinite,somecollisionswill two lineages residing together in the same deme while occurwithratescomparabletoregularcoalescentevents. the rest are in separate demes. Event5inTable1isofthissortandhasarateofO(1/ We note that Equations 13 and 14, correspond to D).However,event5isacollisioninwhichthelineage events that would disrupt the dual process or greatly splits and then immediately coalesces with itself. Both complicate the branching structure, respectively. The forward and backward in time, this has no effect. The zero rates with which we have described them above descendant lineage has the same parent regardless of refertotheirinstantaneousratesinthelimitingprocess, whethertheparentalalleleisA andthe2-arrowisnot 1 butdonotfullyrevealthatevenovertheentiregraph, followed or the parental allele is A and the 2-arrow is 2 untiltheultimateancestorisreached,theseeventswill followed.Werefertothiseventasaself-collision.Only occurwithprobabilityzerointhelimitasD→∞.Proof self-collisionshavethepotentialtooccurwithratescom- of this that allows us to omit these events without loss parable to regular coalescent events. Other kinds of of generality is deferred to appendix a. collisions, for example, event 8 in Table 1, which in- WiththesamelevelofdetailasinTable1,27different cludes some self-collisions, have rates of O(1/D2). The events can be distinguished for a sample of n lineages number of occurrences of all events with rates of this in n (cid:3) 1 demes, i.e., where a single pair resides in one magnitudeisshownlater,inappendixa,tobenegligi- deme. However, it is unnecessary to distinguish all of ble in the limiting (D → ∞) process. these, and Table 2 shows them grouped into just four Now,considerwhathappensinthelimitingprocess. kindsofevents.TheimportantdifferencefromTable1 Bysubstitutingthevalueof(cid:2) fromEquation3intothe 1 isthateventswithratesofO(1)nowhavethepotential entries of Table 1, the limiting rates become toaffectthestateofthesample.TheO(1)eventsinTa- U* (cid:1) limU (cid:1) ∞ (7) ble 2 are a coalescent event between the two lineages 1 1 D→∞ thatshareademeandamigrationeventthatsendsone U* (cid:1) limU (cid:1) ∞ (8) ofthesetwolineagestoanunoccupieddeme.Theother 2 2 D→∞ events that can change the sample are of O(1/D) or (cid:4)n(cid:5)Nm (cid:8) 1 (cid:3) m smaller,andthesearebranchingeventsandmigration U* (cid:1) limU (cid:1) (9) 3 D→∞ 3 2 N events to occupied demes. These fast events would be problematiciftheyweretoactuallyoccurinthelimiting (cid:4)n(cid:5)(N (cid:3) 1)(Nm (cid:8) 1 (cid:3) m) U* (cid:1) limU (cid:1) (10) process, but whenever a pair of lineages resides in the 4 D→∞ 4 2 N same deme events with rates of O(1) will dominate. (cid:6)(1 (cid:3) m)(N (cid:3) 1)(Nm (cid:8) 1 (cid:3) m) Theseevents,1and2inTable2,endwithallremaining U*6 (cid:1) lDi→m∞ U6 (cid:1) n2 N2m lineagesindifferentdemes.Thisguaranteesthatthere (11) will never be more than one multiply occupied deme U* (cid:1) limU (cid:1) n(cid:6)Nm (cid:8) 1 (cid:3) m (12) and,thatwhenthereis,itwillcontainjusttwolineages 7 D→∞ 7 2 N (appendix a). That is, in the limiting process, either a U* (cid:1) limU (cid:1) 0 (13) V or a V event always preempts a V event during the 8 8 1 2 4 D→∞ state where precisely one deme contains two lineages. U* (cid:1) limU (cid:1) 0, (14) 9 D→∞ 9 Onceintheresultingstate,Table1appliesandthehighest Many-DemesAncestralSelectionGraph 1123 TABLE2 The27distinguishabletype(cid:1)andtype2birth/deatheventsinthehistoryofnlineages currentlyinn(cid:3)1differentdemes,groupedintojustfourkindsofevents Rate Outcome V (cid:1)(cid:2)2m(cid:8)O(1/D) Migration(nlineagesinndemes) 1 1 V (cid:1)(cid:2)2(1(cid:3)m)(1/N)(cid:8)O(1/D) Coalescence(n(cid:3)1lineagesinn(cid:3)1demes) 2 1 V (cid:1)(cid:2)[(n(cid:3)2)(cid:8)2(1(cid:3)m)(N(cid:3)1)/N](cid:8)O(1/D) Nochange(nlineagesinn(cid:3)1demes) 3 1 V (cid:1)(cid:2)2s (cid:8)(cid:2)(n(cid:3)2)m(n(cid:3)2)/D(cid:8)O(s /D) Twoormorelineagesinatleast1deme 4 1 D 1 D (cid:6) rate events that affect the sample configuration are of U* (cid:8) P(mig)U* (cid:1) n , (18) 7 2 6 O(1/D). 2 Thus,inarelativelyshorttime,asampleofnlineages in n (cid:3) 1 demes will be converted to a sample of k respectively. Therefore, the rates of coalescence and branching in the limiting (D → ∞) ancestral process lineagesinkdemes.Thevalueofkdependsonwhether become identical to the rates of coalescence and a coalescent event or a migration event occurs. If it is branchinginapanmicticancestralselectiongraph,but acoalescentevent,thenk(cid:1)n(cid:3)1,andifitisamigration wheretimeismeasuredinunitsof(cid:2) generations.From event, then k (cid:1) n. On the timescale above, with (cid:2) (cid:1) 1 1 Equations 3 and 15 we can write (cid:2) (cid:1) ND/(2P(mig)). ND[1(cid:8)(1(cid:3)m)/(Nm)]/2,theseeventsbecomeinstan- 1 2 Since we assume that 0 (cid:11) m (cid:10) 1 and N (cid:12) 1, we have taneousbecausetheirratesapproachinfinityinthelimit 0(cid:11)P(mig)(cid:10)1and(cid:2) (cid:12)ND/2.Withamigrationrate D → ∞. The result is an instantaneous adjustment that of m (cid:1)2 1, the present1model reduces to a panmictic has two possible outcomes, with probabilities modelwith timescale(cid:2) (cid:1)ND/2.Otherwise, theeffect 1 V Nm ofisland-modelsubdivisionistolengthenthetimescale P(mig) (cid:1) lim 1 (cid:1) (15) 2 D→∞ V1 (cid:8) V2 Nm (cid:8) 1 (cid:3) m of the ancestral (and the forward-time) process. Note that the processes of selection and mutation and responddifferentlytosubdivision.Mutationscaleswith (cid:2), while the selection parameter (cid:6) scales simply with V 1 (cid:3) m 1 P(coal) (cid:1) lim 2 (cid:1) . (16) ND/2.Inotherwords,therearedifferenteffectivepopu- 2 D→∞ V1 (cid:8) V2 Nm (cid:8) 1 (cid:3) m lationsizesforselectionandformutation.Theeffective size for selection is smaller and is equal to P(mig) (cid:1) Thiswill actas afilteron thefour-rate Poissonprocess 2 Nm/(Nm(cid:8)1(cid:3)m)timestheeffectivesizeformutation. described by Equations 9–12. Starting with the sample The reason for this is that, when a branching event ofnlineagesinndifferentdemes,wheneveramigration creates a new lineage in the ancestral process, it has a event to an occupied deme occurs without immediate P(coal) (cid:1) (1 (cid:3) m)/(Nm (cid:8) 1 (cid:3) m) chance of being coalescence (event 4), there is a chance P(mig) that 2 2 erased by a coalescent event, so only P(mig) of them the sample will be returned to its original state. The 2 are observable. In the present model, this cancels, ex- rest of the time, i.e., with probability P(coal), event 4 is 2 actly, the factor 1/P(mig) by which the number of converted to event3, which is acoalescent event. When- 2 branching events that occur in the limiting process is ever a branching event occurs where both parents stay increased relative to the number under panmixia. For inthesamedeme(event6), thereisachanceP(coal) 2 thesamereason,thereisalsoadifferencebetweenthe thatitisconvertedintoself-collision,sothatthesample scalings for recombination, which splits lineages, and returns to its original state, and a chance P(mig) that it 2 mutation in similarly structured populations (Nord- isconvertedintoanobservablebranchingevent(event7). borg 2000; Lessard and Wakeley 2004). It is straightforward to show that a Poisson process Thescatteringphase:Theresultsobtainedaboveap- filtered in this way is equivalent to a Poisson process plytothehistoryofasampleofnlineagesinndifferent withanadjustedrate;forexample,seeWakeley(1999). demes.Theyfollowfromthefactthatcoalescentevents Thus,theprobabilitiesP(mig)andP(coal)narrowthe 2 2 within multiply occupied demes and migration events relevant number of O(1/D) events to just two. Using from multiply occupied demes to unoccupied demes Equations 9–12, and simplifying, the limiting rates of occur with rates that are D times larger than the rates coalescence and branching become of branching events and migration events to occupied (cid:4)n(cid:5) demes.Becauseofthis,whenevermultiplesamplesare U* (cid:8) P(coal)U* (cid:1) (17) 3 2 4 2 taken from one or more demes, there will be a brief scattering phase and this will leave the remaining lin- and eagesindifferentdemes(cf.Table2)sothatthecollect- 1124 P.F.SladeandJ.Wakeley ing-phase results above apply. The number of lineages witheachremaininglineageinaseparatedeme,E[Y|n(cid:13)] thatremainattheendofthescatteringphase,andthat istheexpectedvalueofY forasampleofsize|n(cid:13)|inour then enter the collecting phase, will depend on the rescaled,unstructured,collecting-phaseASG.Below,we numberofcoalescenteventsthatoccurwithinthemulti- usetheframeworkofEquation21bothanalyticallyand plysampleddemeordemes.InthelimitasD→∞,the insimulations,whereourresultsarealsoconsistentwith durationofthescatteringphasebecomesnegligibleso keeping track of allelic configurations during the scat- that it can be treated as instantaneous when time is tering phase. measured in proportion to D generations. A compression of the Markov process that describes Becausewehaveassumedthats isO(1/D),whereas the conditional evolution of the unstructured ASG is D the rates of within-deme coalescence and migration to found. This continuous-time Markov jump chain is an occupieddemesdependonlyonN,m,andthesample extension of the conditional ancestral selection graph size(s), thescattering phasehere is thesame as thatin of Slade (2000a) and is derived with new clarity. The apopulationinwhichallgeneticvariationisselectively result is a maximal compression of the ancestral selec- equivalent.AsinWakeley(1998),thescatteringphase tiongraphinwhichthetimingpropertiesofthegraph forasampleofsizenallfromasingledemeisequivalent areretained.Thiscontinues,andimprovesupon,asim- to a series of n Bernoulli trials with probabilities of ilar enhancement of the conditional graph by Slade success (2000b)thatreducesitsbranchingrate.Thisisanalter- nativeversionoftheminimalrepresentationofthecon- (cid:1) P(mig) (cid:1) , (19) ditional graph as discussed in Slade and Neuhauser k (cid:1) (cid:8) k (cid:3) 1 (2003). Without the timing structure of the graph in for k (cid:1) 1, 2,..., n, and in which (cid:1) (cid:1) Nm/(1 (cid:3) m). place,resultsthatdescribesomeprobabilisticproperties Note that k (cid:1) 2 yields P(mig) given in Equation 15. ofasingleancestrallineagewithintheASGareobtained 2 Success is defined to be the migration of one of the byFearnhead(2002).Thedetailsofourgeneralderiva- lineages to an unoccupied deme, and each of these tion are deferred to appendix b. increasesbyonethenumberoflineagesthatwillenter Thesimulationspresentedinresultsareperformed thecollectingphase.Ifweusen(cid:13)todenotethenumber usingthecorrespondingadaptationofacomputational of lineages that will enter the collecting phase, then Monte Carlo method as in Slade (2000a,b). The pur- pose of such a simulation scheme is the calculation of |S(n(cid:13))|(cid:1)n(cid:13) P[n(cid:13)|n] (cid:1) n (20) anapproximateprobabilitydistributionovertherealiza- (cid:1) (n) tionsofthegenealogy.Foraparticulargenealogythen, a weight is attributed to its associated time to the most is its probability function (Wakeley 1998), in which the|S(n(cid:13))|areunsignedStirlingnumbersofthefirstkind recentcommonancestor(TMRCA),andtheweightedaver- n and (cid:1) (cid:1) (cid:1)((cid:1) (cid:8) 1)···((cid:1) (cid:8) n (cid:3) 1). Finally, the ageisevaluateduponcompletionofalargenumberof (n) repetitions.Theimprovedefficiencyinhavingattained scatteringphaseoccursindependentlyamongdemes,so the maximal compression of the genealogical Markov thatwecanmultiplytheprobabilities(20)oversamples chain is substantial; running time until convergence from different demes. thatwaspreviouslyrequiredisnowatleastfullyhalved. The feasibility of at least doubling the branching and METHODOLOGY mutation rates is also achieved. The performance gets betterstillwhencalculatingstatisticsoftheT distri- We have shown that the ancestry of a sample from a MRCA bution,asitwouldalsodoforpropertiesofsubtreesof subdivided population, with equal deme sizes and is- thegenealogy.Ourcompressionreducesthenumberof land-typemigration,andinwhichtwoallelesaresubject possibletransitionsrequiredtodescribetheconditional to selection and mutation, includes an instantaneous ASG at any event, from 10 to only 6. Thus, the size of scattering phase and then enters a collecting phase, thestatespaceofarealizationoftheconditionalgraph whichisequivalenttotheunstructuredASGwithmuta- decreases exponentially. Hence, variance of any corre- tionandselectionparametersscaledappropriately.This spondingsimulation techniqueis reduced.Depending meansthatifYisanymeasurementonthesample,such on the parameter combinations, to produce the com- as the time to common ancestry or the total length posite PDFsin resultswith a Pentium3.06GHz Xeon of the genealogy, we can compute properties of Y by processor,computingtimerangedfromafewhoursto conditioning on the outcome of the scattering phase. several days. For example, if E[Y|n] is the expected value of Y for thesamplen(cid:1)(n1,...,nd)ofsize|n| (cid:1) (cid:1)di(cid:1)1ni fromd different demes, then RESULTS E[Y|n] (cid:1) (cid:1)E[Y|n(cid:13)]P[n(cid:13)|n], (21) From Equation 21 we can see that simple estimators n(cid:13) ofthemigrationparameter(cid:1)areasusefullwhenweak where P[n(cid:13)|n] is the product of probabilities like (20) selection is present as they are when all variation is over demes. Because the scattering phase always ends neutral. For example, let q and q be the probabilities 0 1 Many-DemesAncestralSelectionGraph 1125 ofidentityinstateforsamplesofsizetwotakenfromthe collectingphase.Lowmigrationratesyieldsamplecon- samedemeandfromtwodifferentdemes,respectively. figurationsatthestartofthecollectingphasethatcon- Taking the scattering phase into account, we have sistofclosetodA allelesanddA alleles,providedall 1 2 of thed demes initiallysampled contained atleast one (cid:1) 1 (cid:3) q (cid:1) (1 (cid:3) q), (22) ofeachallele.Thehighestsensitivitytoselectionthere- 0 (cid:1) (cid:8) 1 1 fore is achieved when the population is structured be- tween very many demes connected by an exceedingly and if qˆ and qˆ are estimates of q and q from some 0 1 0 1 lowmigrationrate.Notethatthiseffectisindependent data, then ofwhethertheinitialsamplecontainsmostlythefavored (cid:7)(cid:1)(cid:1) (cid:2)11(cid:3)(cid:3)qˆqˆ1 (cid:3) 1(cid:3)(cid:3)1 (23) orTthoecounnvfearvtoarecdoaalleleslcee.nttimeunitintoyears,itisnec- 0 essarytocalculatetheproductofthe(estimated)effec- is an estimate of (cid:1). Note that (23) has the same form tive population size and number of years per repro- asthevariousestimatorsofgeneflowbasedonFSTorpair- ductivegeneration.Whenthemigrationrate(cid:1)(cid:14)1we wisesequencedifferencesundertheassumptionofselec- compare the structured model only since according to tive neutrality; see, for example, Hudson et al. (1992). thefactorcalculatedintheory,1(cid:8)(cid:1)(cid:3)1,acoalescent Neither theunstructured ASGnor ourslightly more timeunitintheunstructuredASGrepresentsincompa- complicated structured ASG lends itself to analytical rablyfewernumbersofgenerations.Ontheotherhand, computations much more involved than the above. when(cid:1)(cid:15)1thereisameaningfulcomparisonbetween From Equation 21 it is clear how analysis of our struc- results with and without very many demes. When (cid:1) (cid:1) tured ASG is related to analysis of the unstructured 0.01 in the structured genealogies (both neutral and graph. So, for example, it is possible to average such ex- selected)everygenerationthatelapsesaccordingtocoa- pressionsasKroneandNeuhauser’s(1997)Equation lescent time translates as 100 additional generations 3.5 for the expected time to the ultimate ancestor of in an unstructured genealogy. Population subdivision the sample over n (cid:1) |n(cid:13)| at the start of the collecting magnifies the extent of changes to coalescence times phase. Instead of this, we have used simulations that andthusimmediatelyaccentuatestheeffectofselection includethescatteringphaseandthemaximalcompres- on genealogical time depth. sion of the ASG described in appendix b to compute The interaction between migration and selection as theprobabilitydensityfunctionofthetimetothemost it affects the distribution of the conditional T is MRCA recent common ancestor of a sample (TMRCA). shown in Figures 2 and 3. The mutation rate is set at In unstructured nonneutral genealogies with small three levels, (cid:7) (cid:1) 0.01, 0.1, and 1. Low mutation rates mutationratescertainsampleconfigurations,amonga areappropriatesinceweconsiderallelescorresponding sample of size n, have reduced mean coalescent times toaminoacidreplacementsatahypotheticalnucleotide whencomparedtotheirneutralcounterparts,andother sitewithinasinglelocus.(Wenotethat(cid:7)(cid:1)0.001yields sample configurations have enlarged coalescent times a distribution of the T under neutrality that is ex- MRCA (Slade 2000a; see also Slade and Neuhauser 2003). tremely similar to that obtained for (cid:7) (cid:1) 0.01.) The The mixtures of reduced sample configurations at the selectionrateinourthreecasesischoseninaccordance start of the collecting phase in the structured model withtheweightedaveragepredictedtobeoperatingat resultinnoveldistributionsoftheTMRCAunderselection. nucleotide sites by Sawyer et. al(2003), namely (cid:6) (cid:1) 5 This mixture is the same under neutrality as it is with and 7.5. The migration rate is set to two levels, (cid:1) (cid:1) selection.Theeffectofthescatteringphaseunderneu- 0.01 in Figure 2 and (cid:1) (cid:1) 10 in Figure 3. tralityisgovernedbythelevelofmutationinthecollect- The probability density function (PDF) for the low ing phase. Although the sample size is reduced, it can migrationrateandlowmutationratecaseispresented easilycorrespondtoanincreasedTMRCAbecausethead- in Figure 2A. A substantial change in the shape of the justed sample configuration requires longer to assimi- PDF is clear under selection. Increasing the mutation late its ancestors (particularly for low mutation rates). rate, the next lowest migration rate case is shown Fig- IntheunstructuredASGasthemutationratedecreases ure2B.Theextentoftheeffectofselectionappearsto the conditional genealogy becomes more sensitive to diminish at our highest level of the mutation rate in thepresenceofselection(Slade2000a).However,the Figure 2C. The parameters (mean and standard devia- sample configurations that yield the greatest selective tion) of these distributions are compared in Table 3. effect are also the samples least likely to be obtained, In the last case just mentioned, we have a particularly havingextremelysmallsamplingprobabilities.Thesam- novel result as the mean starts to increase again, i.e., pling distribution is increasingly U-shaped as (cid:7) de- toward neutrality, with the selection strength above creases,andunderselection,samplesdominatedbythe some threshold. In both Figure 2 and Table 3 it is as- favoredallelearemorelikelythanthosedominatedby sumedthatsamplesofsize12weretakenfromeachof the unfavored allele. Introducing population subdivi- fourdemes,andthateachofthefoursamplesincluded sionmakesthemostsensitivesamplesaccessiblebecause onecopyofalleleA and11copiesofalleleA ,or(1,11) 1 2 the scattering phase delivers adjusted samples to the for short. 1126 P.F.SladeandJ.Wakeley Figure3.—PDFsfortheT ofasampleofsize48spread MRCA evenlyamongfourdemes,butcomparingtwosampledistribu- tionsofallelesA andA :(1,11)asinFigure2and(11,1). 1 2 In all cases, (cid:1) (cid:1) 10 and (cid:7) (cid:1) 0.1. As in Figure 2, the solid curve shows the PDF for (cid:6) (cid:1) 0; the long-dashed curves, for (cid:6)(cid:1)5.0;andtheshort-dashedcurves,for(cid:6)(cid:1)7.5. (11, 1), respectively. The effect on the shape of the PDFscanbediscerned;however,inTable4comparisons of the parameters of the distributions under panmixia are also shown. With a migration rate as high as (cid:1) (cid:1) 10, the effects of many-demes population structure in Table4,forbothneutralandselectedgenealogies,can still be distinguished. In the neutral case (Table 4, left column),evenwiththesamplesizeadjustmentthemean Figure2.—PDFsfortheTMRCAofasampleofsize48spread TMRCA is considerably higher than it is for panmixia. evenlyamongfourdemesandwithsampleallelecounts(1A , 1 ComparingthetworowsofTable4,wecanseethatthis 11A ). A–Cdepict resultsfor threedifferent mutationrates asind2icated.Themigrationparameterwas(cid:1)(cid:1)0.01,andthe also reflects the 10% lengthening of the genealogies solidcurveshowsthePDFfor(cid:6)(cid:1)0;thelong-dashedcurves, expectedwhen(cid:1)(cid:1)10.Thedifferenceintheselective for (cid:6) (cid:1) 5.0; and the short-dashed curves, for (cid:6) (cid:1) 7.5. The effectbetweenthetwocompletelydifferentinitialsam- caseof(cid:7)(cid:1)0.01and(cid:6)(cid:1)7.5couldnotbecompleteddueto ples is only minor. prohibitivecomputationalrequirements. DISCUSSION Incontrast,whenthemigrationrateishighthesam- ple sizeadjustment is moderate, andthe initial sample Wehaveobtainedasimplestructuredancestralselec- configurationsshowverysimilarresponsestoselection. tion graph in the limit as the number of demes tends The PDFs shown in Figure 3, A and B, correspond to toinfinityandwithisland-typemigrationamongdemes different initial samples within each deme of being ofequalsize.Theresultisunderstoodasanapproxima- either all (1, 11) of A and A alleles or vice versa all tiontotheancestralprocessforsamplesinthepresence 1 2
Description: