ebook img

Information Geometry and Population Genetics: The Mathematical Structure of the Wright-Fisher Model PDF

323 Pages·2017·4.477 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Information Geometry and Population Genetics: The Mathematical Structure of the Wright-Fisher Model

Understanding Complex Systems Julian Hofrichter Jürgen Jost Tat Dat Tran Information Geometry and Population Genetics The Mathematical Structure of the Wright-Fisher Model Springer Complexity Springer Complexity is an interdisciplinary program publishing the best research and academic-level teaching on both fundamental and applied aspects of complex systems – cutting across all traditional disciplines of the natural and life sciences, engineering, economics,medicine,neuroscience,socialandcomputerscience. Complex Systems are systems that comprise many interacting parts with the ability to generate anew qualityof macroscopic collectivebehavior themanifestations of whichare the spontaneous formation of distinctive temporal, spatial or functional structures. Models of such systems can be successfully mapped onto quite diverse “real-life” situations like theclimate,thecoherentemissionoflightfromlasers,chemicalreaction-diffusionsystems, biologicalcellularnetworks, thedynamicsofstockmarketsandoftheinternet,earthquake statistics and prediction, freeway traffic, the human brain, or the formation of opinions in socialsystems,tonamejustsomeofthepopularapplications. Although their scope and methodologies overlap somewhat, one can distinguish the fol- lowing main concepts and tools: self-organization, nonlinear dynamics, synergetics, tur- bulence, dynamical systems, catastrophes, instabilities, stochastic processes, chaos, graphs and networks, cellular automata, adaptive systems, genetic algorithms and computational intelligence. The three major book publication platforms of the Springer Complexity program are the monograph series“Understanding ComplexSystems”focusing on thevariousapplications of complexity, the “Springer Series in Synergetics”, which is devoted to the quantitative theoreticalandmethodological foundations,andthe“SpringerBriefsinComplexity”which are concise and topical working reports, case-studies, surveys, essays and lecture notes of relevance to the field. In addition to the books in these two core series, the program also incorporatesindividualtitlesrangingfromtextbookstomajorreferenceworks. EditorialandProgrammeAdvisoryBoard HenryAbarbanel,InstituteforNonlinearScience,UniversityofCalifornia,SanDiego,USA DanBraha,NewEnglandComplexSystemsInstituteandUniversityofMassachusettsDartmouth,USA Péter Érdi, Center for Complex Systems Studies, Kalamazoo College, USA and Hungarian Academy ofSciences,Budapest,Hungary KarlFriston,InstituteofCognitiveNeuroscience,UniversityCollegeLondon,London,UK HermannHaken,CenterofSynergetics,UniversityofStuttgart,Stuttgart,Germany ViktorJirsa,CentreNationaldelaRechercheScientifique(CNRS),UniversitédelaMéditerranée,Marseille, France JanuszKacprzyk,SystemResearch,PolishAcademyofSciences,Warsaw,Poland KunihikoKaneko,ResearchCenterforComplexSystemsBiology,TheUniversityofTokyo,Tokyo,Japan ScottKelso,CenterforComplexSystemsandBrainSciences,FloridaAtlanticUniversity,BocaRaton,USA Markus Kirkilionis, Mathematics Institute and Centre for Complex Systems, University of Warwick, Coventry,UK JürgenKurths,NonlinearDynamicsGroup,UniversityofPotsdam,Potsdam,Germany AndrzejNowak,DepartmentofPsychology,WarsawUniversity,Poland RonaldoMenezes,FloridaInstituteofTechnology,ComputerScienceDepartment,Melbourne,USA HassanQudrat-Ullah,SchoolofAdministrativeStudies,YorkUniversity,Toronto,ON,Canada PeterSchuster,TheoreticalChemistryandStructuralBiology,UniversityofVienna,Vienna,Austria FrankSchweitzer,SystemDesign,ETHZurich,Zurich,Switzerland DidierSornette,EntrepreneurialRisk,ETHZurich,Zurich,Switzerland StefanThurner,SectionforScienceofComplexSystems,MedicalUniversityofVienna,Vienna,Austria Understanding Complex Systems FoundingEditor:S.Kelso Future scientific and technological developments in many fields will necessarily dependuponcomingtogripswithcomplexsystems.Suchsystemsarecomplexin boththeircomposition–typicallymanydifferentkindsofcomponentsinteracting simultaneouslyandnonlinearlywitheachotherandtheirenvironmentsonmultiple levels–andintherichdiversityofbehaviorofwhichtheyarecapable. TheSpringerSeriesinUnderstandingComplexSystemsseries(UCS)promotes new strategies and paradigms for understanding and realizing applications of complex systems research in a wide variety of fields and endeavors. UCS is explicitlytransdisciplinary.Ithasthreemaingoals:First,toelaboratetheconcepts, methodsandtoolsofcomplexsystemsatalllevelsofdescriptionandinallscientific fields,especiallynewlyemergingareaswithinthelife,social,behavioral,economic, neuro-andcognitivesciences(andderivativesthereof);second,toencouragenovel applicationsoftheseideasinvariousfieldsofengineeringandcomputationsuchas robotics,nano-technologyandinformatics;third,toprovidea singleforumwithin whichcommonalitiesanddifferencesintheworkingsofcomplexsystemsmaybe discerned,henceleadingtodeeperinsightandunderstanding. UCS will publish monographs, lecture notes and selected edited contributions aimedatcommunicatingnewfindingstoalargemultidisciplinaryaudience. Moreinformationaboutthisseriesathttp://www.springer.com/series/5394 Julian Hofrichter (cid:129) JuRrgen Jost (cid:129) Tat Dat Tran Information Geometry and Population Genetics The Mathematical Structure of the Wright-Fisher Model 123 JulianHofrichter JuRrgenJost MathematikindenNaturwissenschaften MathematikindenNaturwissenschaften Max-Planck-Institut MaxPlanckInstitut Leipzig,Germany Leipzig,Germany TatDatTran MathematikindenNaturwissenschaften MaxPlanckInstitut Leipzig,Germany ISSN1860-0832 ISSN1860-0840 (electronic) UnderstandingComplexSystems ISBN978-3-319-52044-5 ISBN978-3-319-52045-2 (eBook) DOI10.1007/978-3-319-52045-2 LibraryofCongressControlNumber:2017932889 ©SpringerInternationalPublishingAG2017 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. Printedonacid-freepaper ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerInternationalPublishingAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Preface Populationgeneticsisconcernedwiththedistributionofalleles,thatis,variantsat ageneticlocus,inapopulationandthedynamicsofsuchadistributionacrossgen- erations under the influences of genetic drift, mutations, selection, recombination andotherfactors[57].TheWright–Fishermodelisthebasicmodelofmathematical populationgenetics.ItwasintroducedandstudiedbyRonaldFisher,SewallWright, Motoo Kimura and many other people. The basic idea is very simple. The alleles in the next generation are drawn from those of the current generation by random samplingwith replacement.When thisprocessis iterated acrossgenerations,then byrandomdrift,asymptotically,onlyasingleallelewillsurviveinthepopulation. Once this allele is fixed in the population,the dynamicsbecomes stationary. This effectcan be counteredby mutationsthat might restore some of those alleles that had disappeared.Or it can be enhancedby selection that mightgive one allele an advantageoverthe others,thatis, ahigherchanceofbeingdrawnin thesampling process. When the alleles are distributed over several loci, then in a sexually recombiningpopulation,theremayalsoexistsystematicdependenciesbetweenthe allele distributions at different loci. It turns out that rescaling the model, that is, lettingthepopulationsizegotoinfinityandthetimestepsgoto0,leadstopartial differentialequations,called the Kolmogorovforward(orFokker–Planck)andthe Kolmogorovbackwardequation.These equationsare well suited for investigating theasymptoticdynamicsoftheprocess.Thisiswhatmanypeoplehaveinvestigated beforeusandwhatwealsostudyinthisbook. So, what can we contribute to the subject? Well, in spite of its simplicity, the model leads to a very rich and beautiful mathematical structure. We uncover this structure in a systematic manner and apply it to the model. While many mathematicaltools,fromstochasticanalysis,combinatorics,andpartialdifferential equations,havebeenappliedtotheWright–Fishermodel,webringinageometric perspective. More precisely, information geometry, the geometric approach to parametric statistics pioneered by Amari and Chentsov (see, for instance, [4, 20] and for a treatmentthat also addresses the mathematical problemsfor continuous sample spaces [9]), studies the geometry of probability distributions. And as a remarkable coincidence, here we meet Ronald Fisher again. The basic concept v vi Preface of information geometry is the Fisher metric. That metric, formally introduced by the statistician Rao [102], arose in the context of parametric statistics rather than in population genetics, and in fact, it seems that Fisher himself did not see thistightconnection.Anotherfundamentalconceptofinformationgeometryisthe Amari–Chentsovconnection[3,10].Asweshallargueinthisbook,thisgeometric perspective yields a very natural and insightful approach to the Wright–Fisher model,andwithitshelpwecaneasilyandsystematicallycomputemanyquantities of interest, like the expected times when alleles disappear from the population. Also, information geometry is naturally linked to statistical mechanics, and this will allow us to utilize powerfulcomputationaltools from the latter field, like the freeenergyfunctional.Moreover,thegeometricperspectiveisaglobalone,andit allows us to connect the dynamics before and after allele loss events in a manner thatismoresystematicthanwhathashithertobeencarriedoutintheliterature.The decisiveglobalquantitiesare the momentsof the process,and with their help and with sophisticated hierarchical schemes, we can construct global solutions of the Kolmogorovforwardandbackwardequations. Letusthussummarizesomeofourcontributions,inadditiontoprovidingaself- containedandcomprehensiveanalysisoftheWright–Fishermodel. (cid:129) We providea new setof computationaltoolsforthe basic quantitiesofinterest of the Wright–Fisher model, like fixation or coexistence probabilities of the differentalleles.Thesewillbespelledoutindetailforvariouscasesofincreasing generality,startingfromthe2-allele,1-locuscasewithoutadditionaleffectslike mutationorselectiontocasesinvolvingmorealleles,severallociand/ormutation andselection. (cid:129) We develop a systematic geometric perspective which allows us to understand results like the Ohta–Kimura formula or, more generally, the properties and consequencesofrecombination,inconceptualterms. (cid:129) Free energyconstructionswill yield new insightinto the asymptotic properties oftheprocess. (cid:129) Our hierarchical solutions will preserve overall probabilities and model the phenomenonof allele loss duringthe processin more geometricand analytical detailthanpreviouslyavailable. Clearly, the Wright–Fisher model is a gross simplification and idealization of a much more complicated biological process. So, why do we consider it then? There are, in fact, several reasons. Firstly, in spite of this idealization, it allows us to develop some qualitative understanding of one of the fundamental biologicalprocesses.Secondly,mathematicalpopulationgeneticsis a surprisingly powerfultool both for classical genetics and modern molecular genetics. Thirdly, asmathematicians,wearealsointerestedintheunderlyingmathematicalstructure for its own sake. In particular, we like to explore the connectionsto several other mathematicaldisciplines. Asalreadymentioned,ourbookcontainsaself-containedmathematicalanalysis oftheWright–Fishermodel.Itintroducesmathematicalconceptsthatareofinterest and relevance beyond this model. Our book therefore addresses mathematicians Preface vii and statistical physicists who want to see how concepts from geometry, partial differential equations (Kolmogorov or Fokker–Planck equations) and statistical mechanics(entropy,freeenergy)canbedevelopedandappliedto oneofthemost importantmathematicalmodelsinbiology;bioinformaticianswhowanttoacquire a theoreticalbackgroundin populationgenetics;and biologistswho are notafraid of abstract mathematical models and want to understand the formal structure of populationgenetics. Our book consists essentially of three parts. The first two chapters introduce the basic Wright–Fisher model (random genetic drift) and its generalizations (mutation,selection,recombination).The nextfew chaptersintroduceandexplore thegeometrybehindthemodel.Wefirstintroducethebasicconceptsofinformation geometry and then look at the Kolmogorov equations and their moments. The geometricstructurewillprovideuswithasystematicperspectiveonrecombination. And we can utilize moment-generating and free energy functionals as powerful computational tools. We also explore the large deviation theory of the Wright– Fisher model. Finally, in the last part, we develop hierarchical schemes for the constructionofglobalsolutionsinChaps.8and9andpresentvariousapplicationsin Chap.10.Mostofthoseapplicationsareknownfromtheliterature,butourunifying perspectiveletsusobtaintheminamoretransparentandsystematicmanner. From a differentperspective, the first four chapters contain general material, a description of the Wright–Fisher model, an introductionto informationgeometry, and the derivation of the Kolmogorov equations. The remaining five chapters containourinvestigationofthemathematicalaspectsoftheWright–Fishermodel, the geometry of recombination, the free energy functional of the model and its properties, and hierarchical solutions of the Kolmogorov forward and backward equations. This book contains the results of the theses of the first [60] and the third author [113] written at the Max Planck Institute for Mathematics in the Sciences in Leipzig under the direction of the second author, as well as some subsequent work.Followingtheestablishedcustominthemathematicalliterature,theauthors are listed in the alphabetical order of their names. In the beginning,there will be someoverlapwiththesecondauthor’stextbookMathematicalMethodsinBiology and Neurobiology [73]. Several of the findings presented in this book have been publishedin[61–64,114–118]. The research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007–2013)/ERCgrantagreementno.267087.Thefirstandthethirdauthors havealsobeensupportedbytheIMPRS“MathematicsintheSciences”. We would like to thank Nihat Ay for a number of inspiring and insightful discussions. Leipzig,Germany JulianHofrichter Leipzig,Germany JürgenJost Leipzig,Germany TatDatTran Contents 1 Introduction................................................................. 1 1.1 TheBasicSetting..................................................... 1 1.2 Mutation,SelectionandRecombination............................ 3 1.3 LiteratureontheWright–FisherModel............................. 8 1.4 Synopsis .............................................................. 12 2 TheWright–FisherModel................................................. 17 2.1 TheWright–FisherModel ........................................... 17 2.2 TheMultinomialDistribution ....................................... 19 2.3 TheBasicWright–FisherModel .................................... 20 2.4 TheMoranModel.................................................... 23 2.5 ExtensionsoftheBasicModel ...................................... 24 2.6 TheCaseofTwoAlleles............................................. 27 2.7 ThePoissonDistribution............................................. 28 2.8 ProbabilitiesinPopulationGenetics ................................ 29 2.8.1 TheFixationTime.......................................... 29 2.8.2 TheFixationProbabilities.................................. 30 2.8.3 ProbabilityofHaving.kC1/Alleles(Coexistence)..... 30 2.8.4 Heterozygosity.............................................. 30 2.8.5 LossofHeterozygosity..................................... 31 2.8.6 RateofLossofOneAlleleinaPopulation Having.kC1/Alleles..................................... 31 2.8.7 AbsorptionTimeofHaving.kC1/Alleles .............. 31 2.8.8 ProbabilityDistributionattheAbsorption TimeofHaving.kC1/Alleles............................ 31 2.8.9 ProbabilityofaParticularSequenceofExtinction ....... 31 2.9 TheKolmogorovEquations.......................................... 32 2.10 LookingForwardandBackwardinTime........................... 33 2.11 NotationandPreliminaries........................................... 35 2.11.1 NotationforRandomVariables............................ 35 2.11.2 MomentsandtheMomentGeneratingFunctions......... 36 ix x Contents 2.11.3 NotationforSimplicesandFunctionSpaces.............. 38 2.11.4 Notation for Cubes and Corresponding FunctionSpaces ............................................ 41 3 GeometricStructuresandInformationGeometry...................... 45 3.1 TheBasicSetting..................................................... 45 3.2 TangentVectorsandRiemannianMetrics........................... 46 3.3 Differentials,Gradients,andtheLaplace–BeltramiOperator...... 50 3.4 Connections .......................................................... 51 3.5 TheFisherMetric .................................................... 56 3.6 ExponentialFamilies................................................. 58 3.7 TheMultinomialDistribution ....................................... 64 3.8 TheFisherMetricastheStandardMetricontheSphere........... 66 3.9 TheGeometryoftheProbabilitySimplex .......................... 68 3.10 TheAffineLaplacian................................................. 70 3.11 TheAffineandtheBeltramiLaplacianontheSphere.............. 73 3.12 TheWright–FisherModelandBrownianMotion ontheSphere......................................................... 74 4 ContinuousApproximations .............................................. 77 4.1 TheDiffusionLimit.................................................. 77 4.1.1 Convergenceof Discrete to Continuous SemigroupsintheLimitN !1.......................... 77 4.2 TheDiffusionLimitoftheWright–FisherModel.................. 88 4.3 MomentEvolution ................................................... 91 4.4 MomentDuality...................................................... 99 5 Recombination.............................................................. 103 5.1 RecombinationandLinkage......................................... 103 5.2 RandomUnionofGametes.......................................... 105 5.3 RandomUnionofZygotes........................................... 107 5.4 DiffusionApproximation............................................ 109 5.5 Compositionality..................................................... 110 5.6 TheGeometryofRecombination.................................... 111 5.7 TheGeometryofLinkageEquilibriumStates...................... 114 5.7.1 LinkageEquilibriainTwo-LociMulti-AllelicModels... 115 5.7.2 Linkage Equilibria in Three-Loci Multi-AllelicModels....................................... 117 5.7.3 TheGeneralCase........................................... 120 6 MomentGeneratingandFreeEnergyFunctionals..................... 123 6.1 MomentGeneratingFunctions ...................................... 123 6.1.1 TwoAlleles ................................................. 124 6.1.2 TwoAlleleswithMutation................................. 128 6.1.3 TwoAlleleswithSelection................................. 130 6.1.4 nC1Alleles................................................ 132

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.