ebook img

Critical population and error threshold on the sharp peak landscape for a Moran model PDF

100 Pages·2014·1.328 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Critical population and error threshold on the sharp peak landscape for a Moran model

MEMOIRS of the American Mathematical Society Volume 233 • Number 1096 (second of 6 numbers) • January 2015 Critical Population and Error Threshold on the Sharp Peak Landscape for a Moran Model Raphae¨l Cerf ISSN 0065-9266 (print) ISSN 1947-6221 (online) American Mathematical Society MEMOIRS of the American Mathematical Society Volume 233 • Number 1096 (second of 6 numbers) • January 2015 Critical Population and Error Threshold on the Sharp Peak Landscape for a Moran Model Raphae¨l Cerf ISSN 0065-9266 (print) ISSN 1947-6221 (online) American Mathematical Society Providence, Rhode Island Library of Congress Cataloging-in-Publication Data Cerf,Rapha¨el. Critical population and error threshold on the sharp peak landscape for a Moran model / Rapha¨elCerf. pages cm. – (Memoirs of the AmericanMathematicalSociety, ISSN 0065-9266; volume 233, number1096) Includesbibliographicalreferencesandindex. ISBN978-1-4704-0967-8(alk. paper) 1. Moleculardynamics–Mathematicalmodels. 2. Macromolecules–Mathematicalmodels. 3. Chromosomereplication–Mathematicalmodels. I.Title. QP517.M65C47 2015 547(cid:2).7–dc23 2014033062 DOI:http://dx.doi.org/10.1090/memo/1096 Memoirs of the American Mathematical Society Thisjournalisdevotedentirelytoresearchinpureandappliedmathematics. Subscription information. Beginning with the January 2010 issue, Memoirs is accessible from www.ams.org/journals. The 2015 subscription begins with volume 233 and consists of six mailings, each containing one or more numbers. Subscription prices for 2015 are as follows: for paperdelivery,US$860list,US$688.00institutionalmember;forelectronicdelivery,US$757list, US$605.60institutional member. Uponrequest, subscribers topaper delivery ofthis journalare also entitled to receive electronic delivery. If ordering the paper version, add US$10 for delivery withintheUnitedStates;US$69foroutsidetheUnitedStates. Subscriptionrenewalsaresubject tolatefees. Seewww.ams.org/help-faqformorejournalsubscriptioninformation. Eachnumber maybeorderedseparately;please specifynumber whenorderinganindividualnumber. Back number information. Forbackissuesseewww.ams.org/bookstore. Subscriptions and orders should be addressed to the American Mathematical Society, P.O. Box 845904, Boston, MA 02284-5904USA. All orders must be accompanied by payment. Other correspondenceshouldbeaddressedto201CharlesStreet,Providence,RI02904-2294USA. Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting for them, are permitted to make fair use of the material, such as to copy select pages for useinteachingorresearch. Permissionisgrantedtoquotebriefpassagesfromthispublicationin reviews,providedthecustomaryacknowledgmentofthesourceisgiven. Republication,systematiccopying,ormultiplereproductionofanymaterialinthispublication is permitted only under license from the American Mathematical Society. Permissions to reuse portions of AMS publication content are handled by Copyright Clearance Center’s RightsLink(cid:2) service. Formoreinformation,pleasevisit: http://www.ams.org/rightslink. Sendrequestsfortranslationrightsandlicensedreprintstoreprint-permission@ams.org. Excludedfromtheseprovisionsismaterialforwhichtheauthorholdscopyright. Insuchcases, requestsforpermissiontoreuseorreprintmaterialshouldbeaddresseddirectlytotheauthor(s). Copyrightownershipisindicatedonthecopyrightpage,oronthelowerright-handcornerofthe firstpageofeacharticlewithinproceedingsvolumes. MemoirsoftheAmericanMathematicalSociety (ISSN0065-9266(print);1947-6221(online)) ispublishedbimonthly(eachvolumeconsistingusuallyofmorethanonenumber)bytheAmerican MathematicalSocietyat201CharlesStreet,Providence,RI02904-2294USA.Periodicalspostage paid at Providence, RI.Postmaster: Send address changes to Memoirs, AmericanMathematical Society,201CharlesStreet,Providence,RI02904-2294USA. (cid:2)c 2014bytheAmericanMathematicalSociety. Allrightsreserved. Copyrightofindividualarticlesmayreverttothepublicdomain28years afterpublication. ContacttheAMSforcopyrightstatusofindividualarticles. (cid:2) ThispublicationisindexedinMathematicalReviewsR,Zentralblatt MATH,ScienceCitation Index(cid:2)R,ScienceCitation IndexTM-Expanded,ISI Alerting ServicesSM,SciSearch(cid:2)R,Research (cid:2) (cid:2) (cid:2) AlertR,CompuMathCitation IndexR,Current ContentsR/Physical, Chemical& Earth Sciences. ThispublicationisarchivedinPortico andCLOCKSS. PrintedintheUnitedStatesofAmerica. (cid:2)∞ Thepaperusedinthisbookisacid-freeandfallswithintheguidelines establishedtoensurepermanenceanddurability. VisittheAMShomepageathttp://www.ams.org/ 10987654321 201918171615 Contents Chapter 1. Introduction 1 Chapter 2. The Model 9 Chapter 3. Main Results 13 Chapter 4. Coupling 19 Chapter 5. Normalized Model 21 Chapter 6. Lumping 25 6.1. Distance process 25 6.2. Occupancy process 28 6.3. Invariant probability measures 30 Chapter 7. Monotonicity 33 7.1. Coupling of the lumped processes 33 7.2. Monotonicity of the model 35 Chapter 8. Stochastic Bounds 39 8.1. Lower and upper processes 39 8.2. Dynamics of the bounding processes 40 8.3. A renewal argument 41 8.4. Bounds on ν 43 Chapter 9. Birth and Death Processes 47 9.1. General formulas 47 9.2. The case of (Ztθ)t≥0 47 9.3. Persistence time 49 9.4. Invariant probability measure 51 Chapter 10. The Neutral Phase 55 10.1. Ancestral lines 55 10.2. Mutation dynamics 58 10.3. Falling to equilibrium from the left 61 10.4. Falling to equilibrium from the right 66 10.5. Discovery time 70 Chapter 11. Synthesis 79 Appendix A. Appendix on Markov Chains 81 Bibliography 85 Index 87 iii Abstract The goal of this work is to propose a finite population counterpart to Eigen’s model,whichincorporatesstochasticeffects. WeconsideraMoranmodeldescribing theevolutionofapopulationofsizemofchromosomesoflength(cid:3)overanalphabet of cardinality κ. The mutation probability per locus is q. We deal only with the sharp peak landscape: the replication rate is σ > 1 for the master sequence and 1 for the other sequences. We study the equilibrium distribution of the process in the regime where (cid:3)→+∞, m→+∞, q →0, m (cid:3)q →a∈]0,+∞[, →α∈[0,+∞]. (cid:3) We obtain an equation αφ(a) = lnκ in the parameter space (a,α) separating the regime where the equilibrium population is totally random from the regime where a quasispecies is formed. We observe the existence of a critical population size necessary for a quasispecies to emerge and we recover the finite population coun- terpart of the error threshold. Moreover, in the limit of very small mutations, we obtain a lower bound on the population size allowing the emergence of a quasis- pecies: if α < lnκ/lnσ then the equilibrium population is totally random, and a quasispecies can be formed only when α ≥ lnκ/lnσ. Finally, in the limit of very large populations, we recover an error catastrophe reminiscent of Eigen’s model: if σexp(−a) ≤ 1 then the equilibrium population is totally random, and a quasis- pecies can be formed only when σexp(−a) > 1. These results are supported by computer simulations. ReceivedbytheeditorMay19,2012,and,inrevisedform,November26,2012. ArticleelectronicallypublishedonMay19,2014. DOI:http://dx.doi.org/10.1090/memo/1096 2010 MathematicsSubjectClassification. Primary60F10,Secondary92D25. Key wordsand phrases. Moran,quasispecies,errorthreshold. The author thanks an anonymous Referee for his careful reading and his remarks, which helpedtoimprovethepresentation. Affiliationattimeofpublication: Universit´eParisSud,Math´ematique,Bˆatiment425,91405 OrsayCedex–France;email: [email protected]. (cid:3)c2014 American Mathematical Society v CHAPTER 1 Introduction In his famous paper [13], Eigen introduced a model for the evolution of a pop- ulationofmacromolecules. Inthismodel,themacromoleculesreplicatethemselves, yet the replication mechanism is subject to errors caused by mutations. These two basic mechanisms are described by a family of chemical reactions. The replication rateofamacromoleculeisgovernedbyitsfitness. AfundamentaldiscoveryofEigen is the existence of an error threshold on the sharp peak landscape. If the mutation rate exceeds a critical value, called the error threshold, then, at equilibrium, the populationiscompletelyrandom. Ifthemutationrateisbelowtheerrorthreshold, then, at equilibrium, the population contains a positive fraction of the master se- quence (the most fit macromolecule) and a cloud of mutants which are quite close to the master sequence. This specific distribution of individuals is called a quasis- pecies. ThisnotionhasbeenfurtherinvestigatedbyEigen,McCaskillandSchuster [15] and it had a profound impact on the understanding of molecular evolution [11]. Ithasbeenarguedthat, atthepopulationlevel, evolutionaryprocesses select quasispecies rather than single individuals. Even more importantly, this theory is supported by experimental studies [12]. Specifically, it seems that some RNA viruses evolve witharather high mutation rate, which is adjusted tobe close toan error threshold. It has been suggested that this is the case for the HIV virus [38]. Some promising antiviral strategies consist of using mutagenic drugs that induce anerrorcatastrophe [2,8]. Asimilar error catastrophecould alsoplay arolein the development of some cancers [36]. Eigen’s model was initially designed to understand a population of macro- molecules governed by a family of chemical reactions. In this setting, the number of molecules is huge, and there is a finite number of types of molecules. From the start, this model is formulated for an infinite population and the evolution is deterministic (mathematically, it is a family of differential equations governing the timeevolutionofthedensitiesofeachtypeofmacromolecule). Theerrorthreshold appears when the number of types goes to ∞. This creates a major obstacle if one wishes to extend the notions of quasispecies and error threshold to genetics. Biological populations are finite, and even if they are large so that they might be considered infinite in some approximate scheme, it is not coherent to consider situ- ations where the size of the population is much larger than the number of possible genotypes. Moreover, it haslong beenrecognizedthatrandom effectsplay amajor role in the genetic evolution of populations [24], yet they are ruled out from the startinadeterministicinfinitepopulationmodel. Therefore,itiscrucialtodevelop a finite population counterpart to Eigen’s model, which incorporates stochastic ef- fects. ThisproblemisalreadydiscussedbyEigen,McCaskillandSchuster[15]and morerecentlybyWilke[41]. Numerousworkshaveattackedthisissue: Demetrius, Schuster and Sigmund [9], McCaskill [27], Gillespie [19], Weinberger [40]. Nowak 1 2 1. INTRODUCTION and Schuster [32] constructed a birth and death model to approximate Eigen’s model. This birth and death model plays a key role in our analysis, as we shall see later. Alves and Fontanari [1] study how the error threshold depends on the population in a simplified model. More recently, Musso [29] and Dixit, Srivastava, Vishnoi [10] considered finite population models which approximate Eigen’s model when the population size goes to ∞. These models are variants of the classical Wright–Fisher model of population genetics. Although this is an interesting ap- proach, it is already a delicate matter to prove the convergence of these models towards Eigen’s model. We adopt here a different strategy. Instead of trying to prove that some finite population model converges in some sense to Eigen’s model, wetrytoprovedirectlyinthefinitemodelanerrorthresholdphenomenon. Tothis end, we look for the simplest possible model, and we end up with a Moran model [28]. Themodelwechoosehereisnotparticularlyoriginal, thecontributionofthis work is rather to show a way to analyze this kind of finite population model. Let us describe informally the model (see chapter 2 for the formal definition). We consider a population of size m of chromosomes of length (cid:3) over the alphabet {A,T,G,C}. We work only with the sharp peak landscape: there is one specific sequence, called the master sequence, whose fitness is σ > 1, and all the other sequences have fitness equal to 1. The replication rate of a chromosome is propor- tional to its fitness. When a chromosome replicates, it produces a copy of himself, which is subject to mutations. Mutations occur randomly and independently at each locus with probability q. The offspring of a chromosome replaces a chromo- some chosen at random in the population. The mutations drive the population towards a totally random state, while the replication favors the master sequence. These two antagonistic effects interact in a complicated way in the dynamics and it is extremely difficult to analyze precisely the time evolution of such a model. Let us focus on the equilibrium distribution of the process. A fundamental problem is to determine the law of the number of copies of the master sequence present in the population at equilibrium. If we keep the parameters m,(cid:3),q fixed, there is little hope to get useful results. In order to simplify the picture, we consider an adequate asymptotic regime. In Eigen’s model, the population size is infinite from the start. The error threshold appears when (cid:3) goes to ∞ and q goes to 0 in a regime where (cid:3)q = a is kept constant. We wish to understand the influence of the population size m, thus we use a different approach and we consider the following regime. We send simultaneously m,(cid:3) to ∞ and q to 0 and we try to understand the respective influence of each parameter on the equilibrium law of the master sequence. By the ergodic theorem, the average number of copies of the master sequence at equilibrium is equal tothe limit, as the timegoesto∞, ofthetimeaverageofthenumber ofcopiesofthemastersequence present through thewhole evolutionof the process. Inthe finite population model, the number of copies of the master sequence fluctuates with time. Our analysis of thesefluctuationsreliesonthefollowingheuristics. Supposethattheprocessstarts with a population of size m containing exactly one master sequence. The master sequence is likely to invade the whole population and become dominant. Then the master sequence will be present in the population for a very long time without interruption. We call this time the persistence time of the master sequence. The destruction of all the master sequences of the population is quite unlikely, nevertheless it will happen and the process will eventually land in the neutral 1. INTRODUCTION 3 region consisting of the populations devoid of master sequences. The process will wander randomly throughout this region for a very long time. We call this time thediscoverytimeofthemastersequence. Becausethecardinalityofthepossible genotypesisenormous, themastersequenceisdifficulttodiscover, neverthelessthe mutationswilleventuallysucceedandtheprocesswillstartagainwithapopulation containing exactly one master sequence. If, on average, the discovery time is much larger thanthe persistence time, then theequilibrium statewill be totallyrandom, while a quasispecies will be formed if the persistence time is much larger than the discovery time. Let us illustrate this idea in a very simple model. 1− θ θ 1 1 2 2 1 1 2 2 ··· 2 2 ··· (cid:0) (cid:0) (cid:0) (cid:0) (cid:0) (cid:0) (cid:0) (cid:0) (cid:0) 0 1 2 i−1 i i+1 (cid:3)−1 (cid:3) Figure 1. Random walk example We consider the random walk on {0,...,(cid:3)} with the transition probabilities de- pending on a parameter θ given by: θ θ 1 p(0,1)= , p(0,0)=1− , p((cid:3),(cid:3)−1)=p((cid:3),(cid:3))= , 2 2 2 1 p(i,i−1)=p(i,i+1)= , 1≤i≤(cid:3)−1. 2 The integer (cid:3) is large and the parameter θ is small. Hence the walker spends its time either wandering in {1,...,(cid:3)} or being trapped in 0. The state 0 plays the roleofthequasispecieswhiletheset{1,...,(cid:3)}playstheroleoftheneutralregion. With this analogy in mind, the persistence time is the expected time of exit from 0, it is equal to 2/θ. The discovery time is the expected time needed to discover 0 starting for instance from 1, it is equal to 2(cid:3). The equilibrium law of the walker is the probability measure μ given by 1 θ μ(0) = , μ(1) = ··· = μ((cid:3)) = . 1+θ(cid:3) 1+θ(cid:3) We send (cid:3) to ∞ and θ to 0 simultaneously. If θ(cid:3) goes to ∞, the entropy factors win and μ becomes totally random. If θ(cid:3) goes to 0, the selection drift wins and μ converges to the Dirac mass at 0. In order to implement the previous heuristics, we have to estimate the per- sistence time and the discovery time of the master sequence in the Moran model. For the persistence time, we rely on a classical computation from mathematical genetics. Supposewestartwithapopulationcontainingm−1copiesofthemaster sequence and another non master sequence. The non master sequence is very un- likelytoinvade the whole population, yet ithas asmall probability todoso, called the fixation probability. If we neglect the mutations, standard computations yield that, in a population of size m, if the master sequence has a selective advantage of σ,thefixationprobabilityofthenonmastersequenceisroughlyoforder1/σm (see for instance [31, Section6.3]). Now the persistence time can be viewed as the time needed for non master sequences to invade the population. This time is approxi- mately equal to the inverse of the fixation probability of the non master sequence,

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.