ebook img

Evolutionary Processes in Finite Populations PDF

0.58 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Evolutionary Processes in Finite Populations

Evolutionary Processes in Finite Populations Dirk M. Lorenz1, Jeong-Man Park1,2 and Michael W. Deem1 1Department of Physics, Rice University, Houston, TX 3 1 2Department of Physics, The Catholic University of Korea, Bucheon, Korea 0 2 January 28, 2013 n a J Abstract 5 2 We consider the evolution of large but finite populations on arbitrary fitness landscapes. We describe the evolutionary process by a Markov, ] Moran process. We show that to O(1/N), the time-averaged fitness is E lowerforthefinitepopulationthanitisfortheinfinitepopulation. Wealso P show that fluctuations in the number of individuals for a given genotype . o canbeproportionaltoapoweroftheinverseofthemutationrate. Finally, i weshowthattheprobabilityforthesystemtotakeagivenpaththrough b - the fitness landscape can be non-monotonic in system size. q [ 1 Introduction 3 v 3 Natural populations are characterized by finite sizes. For this reason, it is im- 2 possible for biology to sample the entire space of all possible genotypes. Even 0 thenumberofpossiblesequenceswithhighfitnessistypicallymuchlargerthan 6 thepopulationsizeinnaturallyoccurringpopulations. Effectsduetofinitepop- . 4 ulation size are particularly pronounced in asexual populations. For example, 0 the reduction of fitness in a finite population without back mutation is termed 2 Muller’s ratchet [1], and the decreased speed of evolution in a finite population 1 without recombination is termed the the Hill-Robertson effect [2]. : v The relative influence of different evolutionary forces changes between small i X and large populations. While stochastic effects such as genetic drift act more r strongly on small populations, natural selection acts more effectively on large a populations. Many results in classical population genetics have focused on the limiting cases of small or infinite populations. In sufficiently small populations, beneficialmutationsoccurbutrarelysurvivelongenoughtobecomeestablished in the population. Those mutations that survive, however, can spread through a small population, reaching fixation, before another beneficial mutation arises. This regime is referred to as successional-mutations regime [3, 4] and is fairly well-understood. This theory has been useful, for example, to understand evo- lution of transcription factor binding sites [5]. As the population size increases, beneficial mutations arise more frequently. Fixation of individual mutations 1 does not occur before the arrival of another beneficial mutation. In asexual populations this leads to competition between descendants of each of the mu- tations — an effect referred to as clonal interference [6]. As the population becomes even larger, ultimately stochastic effects become negligible, and the time-evolution of the evolving population can be described by a set of ordinary differential equations. This regime has been studied extensively in quasispecies theory, albeit often only for simple fitness functions. Here we investigate the regime between clonal interference and quasispecies theory. We seek to predict the evolutionary dynamics followed by a large yet finite population and how this dynamics differs from that of an infinite popu- lation. The study of finite-population effects requires a stochastic description basedonamasterequation[7]. Wemakenoassumptionaboutthefitnessland- scape upon which the population evolves. We show that, averaged over time, theaveragefitnessofalargefinitepopulationislowerthanthatofapopulation ofinfinitesize. Inotherwords, forlargeasexualpopulationsevolvingonafixed fitness landscape, an increase in population size is accompanied by an increase in the average fitness. Furthermore, small mutation rates lead to high fluctu- ations and correlations. In particular, for small mutation rates, fluctuations and correlations in the number of individuals for a given genotype are inversely proportional to a power of the mutation rate. These large correlations enhance finitepopulationeffectsandmaketheconvergencetoinfinite-populationbehav- ior occur only for extremely large populations. This article is organized as follows. We describe the stochastic process un- derlying our studies in section 2. We explain how this dynamic process can be written as a field theory. We derive analytic results for the infinite population evolution from this field theory. We describe finite population effects in section 3. We introduce the fitness landscape that we use to illustrate our results in section 4. In section 5 we investigate fluctuations in this random process. We verifyouranalyticresultsusingstochasticsimulationsinsection5. Weconclude in section 6. 2 Stochastic Process Mapped to a Field Theory Throughoutthisarticle, weusetheMoranprocesstomodelevolutionofapop- ulation[8]. Theindividualsinthepopulationareidentifiedbytheirgenotype,a sequenceoflengthl. Inthiscontinuous-timeprocessaconstantpopulationsize, N, is maintained by simultaneous replication and death. The individual to be replicatedischosenrandomlyfromthepopulationwithprobabilityproportional to its microscopic fitness, while the individual to be killed is chosen randomly from the population with uniform probability. We further assume that repli- cation and mutation are independent. Thus, there are two classes of events: mutation and replication. Mutation from genotype i to genotype j occurs at a rate of µ∆ N , where µ is the mutation rate per locus, N is the number of ij i i individuals with genotype i, and ∆ is equal to one if an individual can mu- ij tate from sequence i to sequence j with a single mutation and ∆ is equal to ij 2 zero otherwise. This description allows for the incorporation of back-mutations whichareoftenignoredintheliterature. Notethattheanalyticalresultsinthis paper do not depend on this binary form of the matrix ∆. Its elements can be arbitrary non-negative numbers as would be appropriate if back-mutation rates differed from forward mutation rates. Replication of genotype i and simultane- ousdeathofgenotypej occursatarateof 1r N N ,wherer isthereplication N i i j i rate of sequence i. The stochastic master equation for this process is ∂ (cid:88) P(N;t)=µ ∆ [(N +1)P(N +e e ;t) N P(N;t)] ij i i j i ∂t − − i,j 1 (cid:88) (cid:88) + r [(N 1)(N +1)P(N e +e ;t) N N P(N;t)]. i i j i j i j N − − − i j=i (cid:54) (1) Here N is a vector describing the state of the population by the number of individuals of each genoptype: (N ,N ,...), and e is a unit vector associated 1 2 i (cid:80) with genotype i. Note N =N. i i We obtain analytic expressions for the average occupation numbers and the fluctuationsbymappingthestochasticprocessdescribedintheprevioussection onto a field theory following [9]. To do this we introduce the state vector (cid:88) ψ(t) = P(N;t) N (2) | (cid:105) | (cid:105) N whose time evolution is governed by  ∂ (cid:88) (cid:88) ψ(t) = µ ∆ij[(Ni+1)P(N +ei ej;t) NiP(N;t)] ∂t| (cid:105) − − N i,j  1 (cid:88) (cid:88) + ri [(Ni 1)(Nj +1)P(N ei+ej;t) NiNjP(N;t)] N . N − − − | (cid:105) i j=i (cid:54) (3) By defining annihilation and creation operators aˆi|N(cid:105)=Ni|N −ei(cid:105), aˆ†i|N(cid:105)=|N +ei(cid:105) aˆiaˆ†j −aˆ†jaˆi =δij, (4) we can write the governing equation for the state vector as ∂ ψ(t) = Hˆ ψ(t) , (5) ∂t| (cid:105) − | (cid:105) where (cid:88) (cid:16) (cid:17) 1 (cid:88) (cid:16) (cid:17) −Hˆ =µ ∆ij aˆ†j −aˆ†i aˆi+ N riaˆ†i aˆ†i −aˆ†j aˆiaˆj. (6) i,j i,j 3 This differential equation has the formal solution ψ(t) =e Hˆt ψ(0) , (7) − | (cid:105) | (cid:105) where ψ(0) =(cid:12)(cid:12)N0(cid:11) is the initial distribution of individuals in the population. | (cid:105) At time T, the average of an observable represented by the (normal-ordered) ope=rat0or((cid:81)F({eaˆaˆ†ii),aˆi}) can be obtained [10] by multiplying with the “sum bra” (cid:104)·| (cid:104) | i (cid:104)F(cid:105)T =(cid:104)·|F({aˆ†i,aˆi})|ψ(T)(cid:105)=(cid:10)·(cid:12)(cid:12)F({aˆ†i,aˆi})e−HˆT (cid:12)(cid:12)N0(cid:11). (8) We introduce a Trotter factorization for the evolution operator e HˆT, using a − timeinterval(cid:15) 0, inthebasisofcoherentstatesdefinedbyaˆ z =z z and i i → | (cid:105) | (cid:105) obtain a path integral representation (cid:10)·(cid:12)(cid:12)F({aˆ†i,aˆi})e−HˆT (cid:12)(cid:12)N0(cid:11)=(cid:10)·(cid:12)(cid:12)F({aˆ†i,aˆi})e−(cid:15)Hˆ ·e−(cid:15)Hˆ · ··· ·e−(cid:15)Hˆ (cid:12)(cid:12)N0(cid:11) (cid:90) = [Dz Dz]F( z(T/(cid:15)) )e S(z,z∗). (9) ∗ − { } Here, the action in the exponent is, after the change of variables z 1+z¯, i∗ ≡ i   T/(cid:15) T/(cid:15) (cid:88) (cid:88) (cid:88) S(z,z¯)=  z¯i(k)zi(k) z¯i(k)zi(k 1) Ni(0)ln(1+z¯i(0)) − − − i k=0 k=1 T/(cid:15) (cid:88)(cid:88) µ(cid:15) (z¯ (k) z¯(k))z (k 1)∆ j i i ij − − − k=1 i,j T/(cid:15) (cid:15) (cid:88)(cid:88) r (1+z¯(k))(z¯(k) z¯ (k))z (k 1)z (k 1). (10) i i i j i j − N − − − k=1 i,j The population dynamics in the limit as the population size, N, becomes infiniteemergesasasaddlepointintheaction[9]. Setting δS/δz (t) =0leads i |c toz¯c(t)=0. Fromsetting δS/δz¯(t) =0weobtainzc(t)=Np (t)wherep (t) i i |c i i i obeys the differential equation dpi (cid:88) =µ (∆ p ∆ p )+r p r p . (11) ji j ij i i i i dt − −(cid:104) (cid:105) j (cid:80) Here r = r p is the average fitness of the infinite population. This differ- (cid:104) (cid:105) j j j ential equation has the closed-form solution [11] (cid:80) (cid:0)eYt(cid:1) p (0) j ij j pi(t)= (cid:80) (eYt) p (0), (12) a,j aj j (cid:80) where the matrix Y is defined by Y =µ∆ µδ ∆ +δ r . ij ji− ij k ik ij i 4 3 Finite Population Shift to Probability Distri- bution We proceed to quantify analytically how finite population effects alter the in- finite population dynamics. To do so we expand the action about the saddle point and separate it into a Gaussian and a non-Gaussian part. Introducing z (k)=z (k)+δz (k) and z¯(k)=δz¯(k) in Eq. 10 we can write S =S +∆S, i ci i i i 0 where the reference action S can be written as 0 1 S0 = 2xT ·Π−01·x (13) where xT =( δz¯(0),δz(0) , δz¯(1),δz(1) , , δz¯(T/(cid:15)),δz(T/(cid:15)) ). (14) { } { } ··· { } Here,    (cid:0)Π−01(cid:1)00 −(cid:0)Π−01(cid:1)01 0 0 ···     −(cid:0)Π−01(cid:1)10 (cid:0)Π−01(cid:1)11 −(cid:0)Π−01(cid:1)12 0 ···  Π−01 = 0 −(cid:0)Π−01(cid:1)21 (cid:0)Π−01(cid:1)22 −(cid:0)Π−01(cid:1)23 ...  (15)    0 0 −(cid:0)Π−01(cid:1)32 (cid:0)Π−01(cid:1)33 ...    ... ... ... ... ... with (cid:18) (cid:19) (cid:18) (cid:19) (cid:0)Π−01(cid:1)00 = δijNδi(0) δ0ij , (cid:0)Π−01(cid:1)kk = −(cid:15)δ(B)ij δ0ij , ij ij (cid:18) (cid:19) (cid:18) (cid:19) (cid:0)Π−01(cid:1)k,k 1 = 00 δij +0(cid:15)(A)ij , (cid:0)Π−01(cid:1)k 1,k = δ +0(cid:15)(A)T 00 . (16) − − ij ij The matrices A and B are (cid:32) (cid:33) (cid:88) 1 (A) =µ∆ µδ ∆ + r z (k 1)+δ r ij ji− ij im N i ci − ij i m (cid:32) (cid:33) 1 (cid:88) 1 δ r z (k 1) r z (k 1), (17) − N ij m cm − − N j ci − m and 1 (B) =2δ r z (k 1) (r +r )z (k 1)z (k 1). (18) ij ij i ci − − N i j ci − cj − 5 The non-Gaussian part of the action is given by (cid:20) (cid:21) (cid:88) 1 ∆S = N (0) ln(1+δz¯(0)) δz¯(0)+ (δz¯(0))2) i i i i − − 2 i T/(cid:15) (cid:15) (cid:88)(cid:88) [r (δz¯(k) δz¯ (k))δz (k 1)δz (k 1) (19) i i j i j − N − − − k=1 i,j +r δz¯(k)(δz¯(k) δz¯ (k))z (k 1)δz (k 1) i i i − j ci − j − +r δz¯(k)(δz¯(k) δz¯ (k))δz (k 1)z (k 1) i i i − j i − cj − +r δz¯(k)(δz¯(k) δz¯ (k))δz (k 1)δz (k 1)]. i i i j i j − − − This formulation allows us to calculate averages using the Gaussian action and thermodynamic perturbation theory, which is equivalent to a cumulant expansion. The average occupation numbers are given by (cid:90) (cid:104)Ni(cid:105)T =(cid:10)·(cid:12)(cid:12)aˆ†iaˆie−HˆT (cid:12)(cid:12)N0(cid:11)=(cid:10)·(cid:12)(cid:12)aˆie−HˆT (cid:12)(cid:12)N0(cid:11)= [Dz∗Dz]zi(T/(cid:15))e−S(z,z¯) (20) (cid:90) = [Dz∗Dz]zi(T/(cid:15))e−∆Se−S0 =(cid:10)zi(T/(cid:15))e−∆S(cid:11)0 (21) = z (T/(cid:15)) z (T/(cid:15))∆S + 1(cid:10)z (T/(cid:15))(∆S)2(cid:11) + (22) (cid:104) i (cid:105)0−(cid:104) i (cid:105)0 2 i 0 ··· =Np (T) δz (T/(cid:15))∆S + 1(cid:10)δz (T/(cid:15))(∆S)2(cid:11) + , (23) i −(cid:104) i (cid:105)0 2 i 0 ··· where the last step follows from (∆S)n = 0 n Z,n 1. This procedure (cid:104) (cid:105)0 ∀ ∈ ≥ leadstoanasymptoticexpansionfortheoccupationnumbersinpowersof1/N. To first order, we obtain 1 1 (cid:90) T (cid:88) N (T) p (T)+ dt Π zz¯(T,t)Π zz(t,t)(r r ). (24) N (cid:104) a(cid:105) ∼ a N2 0ai 0ij i− j 0 i,j This expansion about infinite size is accurate when the correction term on the right hand side of Eq. (24) is much smaller than p (T). Equation (36) provides a an estimate of the magnitude of the correction for a common landscape with k intermediate steps. The second order term is given by Eq. A.1 in the appendix. We derive expressions for the matrices Π zz¯(T,t) and Π zz(t,t) by inverting 0ai 0ij Π−01 in Eq. 15. In continuous time for T >t, they obey ∂Π zz¯(T,t) 0 =A(T)Π zz¯(T,t), (25) 0 ∂T with Π zz¯(t,t)=I (26) 0 6 and dΠ zz(t,t) 0 =B(t)+A(t)Π zz(t,t)+Π zz(t,t)AT(t), (27) 0 0 dt with Π zz(0,0)= δ N (0). (28) 0ij − ij i Using the expression for the first-order shift to the occupation numbers due tofinitepopulationeffects,wecalculatethefinitepopulationshiftintheaverage fitness of the population. The average fitness correction is 1 (cid:90) T (cid:88) δr(T) = dt r Π zz¯(T,t)Π zz(t,t)(r r ) (29) (cid:104) (cid:105) N2 a 0ai 0ij i− j 0 i,j,a = 1 (cid:90) T dt (cid:88)r Π zz¯(T,t)(cid:0)Π zz(t,t)+Nδ p (t)(cid:1)r , (30) −N2 a 0ai 0ij ij i j 0 i,j,a This result shows that the correction to the mean fitness is (1/N) the mean O fitnessinthelimitofinfinitepopulation. Thisresultcanberewritteninamore revealing form. Let r¯(t) be a random variable defined as 1 (cid:88) r¯(t) r (N (t) N (t) ) (31) i i i ≡ N −(cid:104) (cid:105) i in the limit of large population size. The finite population correction to the average fitness can then be written as (cid:90) T δr(T) = r¯(T)r¯(t) dt (32) (cid:104) (cid:105) − (cid:104) (cid:105) 0 and its time integral as (cid:90) T (cid:90) T (cid:90) t 1(cid:42)(cid:32)(cid:90) T (cid:33)2(cid:43) δr(t) dt= dt dt r¯(t)r¯(t) = r¯(t)dt . (33) (cid:48) (cid:48) (cid:104) (cid:105) − (cid:104) (cid:105) −2 0 0 0 0 Thisexpressionfortheaveragefitnesscorrection, whichresemblesafluctuation dissipation theorem, implies that the time-average of the finite-population shift is always negative. In other words, the average fitness of a large finite popula- tion is smaller than that of a population of infinite size. Note that this result is perturbative, valid for large population size N, and it does not require the average fitness to be a monotonic function of N for small N. On complex fit- ness landscapes, it is possible for small asexual populations to achieve a higher average fitness than larger ones [12]. Nonetheless, for sufficiently large pop- ulation sizes, the time-integrated average fitness increases monotonously with population size. 7 12 1 4 1 123 213 21 13 132 0 7 0 2 5 2 312 31 ⇒ 23 231 3 6 3 321 32 Figure 1: Left-hand side: the state-space for a fitness landscape with three forward-mutations and no back-mutations. Each node, i, is a particular geno- type. The replication rate of each genotype is r . Right-hand side (discussed i in Section 6): The state-space can be expanded to include mutational histories. Each two-mutation state is split into 2! = 2 states while the three-mutation state is split into 3! = 6 states. The node is now identified by a vector which conveys the mutational history of a particular path through the landscape. 4 The Landscape The analytical expressions developed in this paper are applicable to arbitrary fitnesslandscapesandmutationalpathways. However,wenowdescribeinsome detail the implications for fitness landscapes [13] defined by a certain number of fitness loci l with two alleles each. Genotypes that differ from each other by exactly one point mutation in one of the loci are connected in the mutation matrix. Each position in sequence space is thus connected by a mutation event tol othergenotypes. Figure1showsthegeometryofthelandscapeforthecase ofthreeloci. Typicallyinthislandscape,thefitnessofeachstateincreasesupon moving to the right in the figure. 5 Fluctuations around the Mean The matrices Π zz(t,t) and Π zz¯(T,t) can be understood intuitively. In the 0 0 limit of large N, the off-diagonal elements of Π zz(t,t) describe the covariances 0 between the occupation numbers at time t while the diagonal elements are re- lated to the variances of the occupation numbers at time t by (cid:18) (cid:19) 1 1 1 (δN (t))2 p (t)+ Π zz(t,t) . (34) N2 a ∼ N a N 0aa Atdifferenttimes,Π zz(T,t)givesthecross-covariancesbetweentheoccupation 0 numbers at times T and t. The matrix Π zz¯(T,t) relates the correlations at 0ai different times to the same-time correlations via Π zz(T,t)=Π zz¯(T,t)Π zz(t,t). (35) 0 0 0 8 We observe numerically that for small mutation rates, the fluctuations are pro- portional to a negative power of the mutation rate. Specifically, (cid:18) (cid:19)k 1 1 r (δN (t))2 , (36) N2 a ∼ N µ where k is the number of mutational steps as shown in Fig. 2. This dependence can also be shown analytically for sufficiently simple landscapes. See section B in the appendix for one example. Thus the expansion, which naively appears to be in 1/N is actually in 1/(Nµk). Thus, the expansion breaks down when µ<1/N1/k. The expansion is valid for large N and µ 1/N1/k. (cid:29) 107 1010 1010 dt dt dt zzd(t,t)/Π0ii110035 zzd(t,t)/Π0ii105 zzd(t,t)/Π0ii105 x t,i x t,i x t,i ma 101 ma ma 100 100 10−7 10−6 10−5 10−4 10−7 10−6 10−5 10−4 10−6 10−5 10−4 Mutation Rate, µ Mutation Rate, µ Mutation Rate, µ (a) (b) (c) Figure 2: The maximal change of the variance with time (+), i.e. max dΠ zz(t,t)/dt where Π zz is obtained from Eqs. 27 and 28, depends on t,i 0ii 0 the mutation rate as an inverse power law. Shown are calculations for a non- epistatic version of the landscape as described in section 4 with a) two possible mutations — r = 0,∆r 0.049,∆r 0.010, b) three possible mutations — 0 1 2 ≈ ≈ r =0,∆r 0.049,∆r 0.010,∆r 0.002—andc)fourpossiblemutations 0 1 2 3 ≈ ≈ ≈ — r = 0,∆r 0.049,∆r 0.020,∆r 0.006,∆r 0.002. In this case, 0 1 2 3 4 ≈ ≈ ≈ ≈ the fitness of each state is simply the sum of contributions from each mutation. The solid lines indicate power law fits using the values for µ 10 5. Their − ≤ exponents are a) -1.999, b) -2.989, and c) -3.939. The exponent is observed to be equal to the number of mutational steps in the landscape. We verify our analytical results by performing stochastic simulations using the Lebowitz/Gillespie algorithm [14, 15]. Rewriting Eq. 24 for the first order shifts to the occupation numbers, 1 (cid:90) T (cid:88) N (T) Np (T) dt Π zz¯(T,t)Π zz(t,t)(r r ), (37) (cid:104) a(cid:105) − a ∼ N 0ai 0ij i− j 0 i,j we observe that the finite population correction converges to a constant value forlargeN. Theaveragereplicationrateinthepopulationislinearintheoccu- pation numbers. It is equal to 1 (cid:80) r N (t). Therefore, the average replication N i i i rate also converges to the quasispecies result in the limit of a large population. That is, the average replication rate is equal to that of the infinite population 9 plus a correction that is of order 1/N smaller. Figure 3 shows this conver- gence for one set of parameters. As a further check on our analytic results, we fit a cubic polynomial in 1/N to the simulation data displayed in Fig. 3. For the particular fitness parameters chosen here, the coefficients from this fit are 320.4 2.5fortheconstanttermand( 5.3 0.8) 105 forthelinearterm,while ± − ± × our theory predicts 319.0 and 5.2 105, respectively. Here, the coefficient of − × the linear term is obtained from Eq. A.1 in Appendix A. Similarly, we observe 400 0 0 > r a 200 < p − −0.002 − N 0 N> > /a−0.004 a N N < < −200 a ra−0.006 Σ −400 3 4 5 6 3 4 5 6 10 10 10 10 10 10 10 10 Population Size, N Population Size, N (a) (b) Figure 3: (a) Finite-population correction to the average occupation numbers (left-handsideofEq.37)asafunctionofpopulationsize,N,onathree-mutation landscape as shown in Fig. 1 including back-mutations. Shown are data for a mutation rate of µ = 10 5 and replication rates of r = 0,r 0.049,r − 0 1 2 ≈ ≈ 0.010,r 0.002,r 0.059,r 0.051,r 0.012, and r 0.061. The time 3 4 5 6 7 ≈ ≈ ≈ ≈ ≈ is chosen as T = 157.5 which approximately maximizes N (T) Np (T). 0 0 (cid:104) (cid:105) − As N increases, the corrections obtained from stochastic simulations — N ( ), 0 N ( ), N (+), N ( ), N ((cid:3)), N (♦), N ( ), N ( ) — converge to the val×ues 1 2 3 4 5 6 7 (cid:13) ∗ (cid:53) (cid:52) predicted by the theory (solid lines). The dashed curves show the second order expansion, given by Eqs. 37 and A.1. The error bars are one standard error. (b) Finite-size correction to the mean population fitness. The average replica- tion rate in the population is linear in the occupation numbers, being equal to 1 (cid:80) r N (t), and so it too converges to the quasispecies result in the limit of N i i i a large population. that the variances obtained from stochastic simulations agree with the analytic expression given in Eq. 34 as shown in Fig. 4. 6 Discussion and Conclusion Although the theory described in this paper was developed to study the time- evolution of the occupation numbers in sequence space, we can immediately applytheseresultstoinvestigatewhichmutationalpathsindividualstake. This allowsustopredictthelargeN behavioroftheprobabilitythatapopulationwill 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.