ebook img

Mutation-selection dynamics and error threshold in an evolutionary model for Turing Machines PDF

1.2 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Mutation-selection dynamics and error threshold in an evolutionary model for Turing Machines

Mutation-selection dynamics and error threshold in an evolutionary model for Turing Machines Fabio Musso & Giovanni Feverati § † DepartamentodeF´ısica,UniversidaddeBurgos, § PlazaMisaelBan˜ueloss/n,09001Burgos,Spain [email protected] LaboratoiredephysiquetheoriqueLAPTH,CNRS,Universite´deSavoie, † 1 9,ChemindeBellevue,BP110,74941,AnnecyleVieuxCedex,France 1 [email protected] 0 2 Keywords: Darwinianevolution,in-silicoevolution,mutation-selection,errorthreshold,Turingmachines n a J Abstract 0 2 Weinvestigatethemutation-selectiondynamicsforanevolutionarycomputationmodelbasedonTur- ] ingMachinesthatweintroducedinapreviousarticle[1]. E P The use of Turing Machines allows for very simple mechanisms of code growth and code activa- . tion/inactivationthroughpointmutations. Toanyvalueofthepointmutationprobabilitycorresponds o amaximumamountofactivecodethatcanbemaintainedbyselectionandtheTuringmachinesthat i b reachitaresaidtobeattheerrorthreshold.SimulationswithourmodelshowthattheTuringmachines - q populationevolvetowardstheerrorthreshold. [ Mathematical descriptions of the model point out that this behaviour is due more to the mutation- 1 selectiondynamicsthantotheintrinsicnatureoftheTuringmachines.Thisindicatesthatthisresultis v muchmoregeneralthanthemodelconsideredhereandcouldplayarolealsoinbiologicalevolution. 7 8 8 1 Introduction 3 . 1 Thestudyof“insilico”evolutionarymodelshasincreasedsignificantlyinrecenttimes,see[2],[3],[4],[5],[6],[7], 0 [8],[1]justtogivesomeexamples. Thebasicideabehindthesemodelsistosimulatetheevolutionofcomputer 1 algorithms subject to mutation and selection procedures. In this artificial evolution setting, the algorithms play 1 : theroleofthe biologicalorganismsandtheyareselectedonthebasisoftheirabilityin performingoneormore v prescribedtasks(replicatethemselves,computesomemathematicalfunction,etc.).Whilethesimulatedalgorithms i X haveclearlyanincomparablylesserdegreeofcomplexitythanawhateverbiologicalorganism,thehopeisthat(at r leastsomeof)thephenomenaobservedinthedigitalevolutionmodelcouldcorrespondtogeneralbehavioursof a evolutionarysystems. Indeed, it seemsthatthis is whathappensin some cases: emergenceofparasitism in [2], quasi-speciesselectionin[4]andthestrikingsimilaritybetweentheC-valueenigma[9]andthephenomenonof code-bloatinevolutionaryprogramming[10],[1]. Oneofthemotivationsforperformingartificialevolutionexperimentsisthecontinuouslyincreasingcomputa- tionalpowerofmoderncomputers. Nowadays,veryfastmultiprocessorcomputershaverelativelylowpricesand manyscientific institutions have at their disposallarge facilities for parallelcomputation. For example, one run lasting 50000generationsof a populationof 300 Turing machines(TMs) of our evolutionarymodellasts about halfadayperprocessoronanordinaryhomecomputer(forthehighervalueofthestates-increaseratep,forlower i valuesitlastsconsiderablyless). ThelongtermevolutionexperimentonE.colidirectedbyR.E.Lenskireached the40000generationsafteralmost20years[11](however,thepopulationconsideredinthisexperimentismuch larger,of the orderof 107 cells). Whenpopulationsize is nota crucialparameter,digitalevolutionexperiments canexploreanumberofgenerationsinaccessibletolaboratoryexperimentswithrealorganisms. Ifonewantsto studyevolutionaryeffectsonasolargetimescaleinrealbiologicalorganisms,thenhastoresorttopaleontological studies. However,suchstudiesarevexedbytheincompletenessofthefossilrecordandbytheunrepeatabilityof 1 the experiments. Indeed, repeatability allows to discriminate easily among effects due to adaptation and those simplyduetodrift. TheseproblemsareovercameinlaboratoryexperimentssuchasLenskione,butattheprice of reducingthe environmentto a Petri dish. Artificial evolution experimentsallow to explorelarger time scales thanlaboratoryexperimentsatmuchreducedcosts,butatthehigherpriceofreplacingbiologicalorganismswith algorithms. Bytheway,thereisanotherbigadvantagewhenperformingartificialevolutionexperiments,namely thecompletecontroloveralltheexperimentalsettings.Thisgivestheopportunitytouseareductionisticapproach, by studyingseparately the effects of the variousmechanisms involvedin the evolutionarydynamics, something thatisverydifficulttoobtainwhenworkingwithrealorganisms. Finally,asalastargumentinfavourofartificial evolutionexperiments,weciteonegivenbyMaynardSmith[12]:“...webadlyneedacomparativebiology.Sofar, wehavebeenabletostudyonlyoneevolvingsystemandwecannotwaitforinterstellarflighttoprovideuswitha second.Ifwewanttodiscovergeneralizationsaboutevolvingsystems,wewillhavetolookatartificialones.” Aswesaid,eventhemostcomplicatedcomputeralgorithmisincomparablysimplerthanawhateverbiological organism. Moreover, typical artificial evolution experimentshave a unique ecologicalniche and the interaction betweentheartificialorganismsisoftenlimitedtothecomparisonoftheirperformances. So,averybigdistance separatesartificialevolutionexperimentsfrombiologicalevolution. Forthisreason,manybiologistsareskeptical onthebiologicalrelevanceoftheresultsobtainedinthedigitalframework;forexample,someobjectionstypically raisedarereportedin[13]. Ontheotherhand,supportersofartificialevolutionexperimentsreplythattheobserved resultscan actuallybegeneralphenomenaofevolutivesystems, thereforebeingindependentfromthe particular modelunderconsideration.Totestthishypothesisitwouldbenicetocomparetheresultsobtainedintheartificial evolution setting with real biological data, but this is very hard to do for long-term evolutionary effects, that is whereartificialevolutionmodelsaremostuseful. Ontheotherhand,generalevolutionarybehavioursdoemerge ifthemutation-selectiondynamicshaveaprominentroleonthepeculiarcharacteristicsoftheevolvingorganism. Whenthisisthecase,theobservedeffectscanbereproducedthroughapopulationgeneticmathematicalmodel. Indeed,thesemodelscenteronthedynamicsinducedbytheselectionandmutationoperators(undersomework hypotheses),morethaninthespecificdetailsoffunctioningoftheorganism.Ifsuccessful,thisprocedureextends the validityof the resultsobservedin the evolutionarymodelunderconsiderationto all the evolutionarymodels workingwiththesamemutationandselectionoperators(underthesamehypotheses).Thismeansthattheproblem ofthebiologicalrelevanceoftheresultsobtainedintheartificialevolutionexperimentisswitchedtotheproblem of assessing the biological likelihood of the mutation and selection operators and of the hypothesesused in the mathematicalmodel. Inthispaperweapplythisstrategy,sothatwederiveadeterministicandastochasticpopulationgeneticmodel of our evolutionary model for TMs. Our main aim is to show that the evolutionary dynamics pushes the TMs towardtheerrorthreshold[14],Thepopulationgeneticmodelisusedtocomputemathematicallythevalueofthe errorthresholdandtoshowthatthisdynamicalbehaviourisduetoquitemildhypotheses Accordingtothisprogram,intheMaterialsandMethodssectionwefirstbrieflyrecallourevolutionarymodel for TMs [1] and the Eigen error threshold concept. A deterministic population genetic model for our digital evolutionmodelisintroducedinthethirdsubsection“Thedeterministicmodel”,whileitsstochasticcounterpart (limitedtotheevolutionofthebestperformingTMs)isgiveninthefourthone.IntheResultssectionwereportthe resultsobtainedbythecomputersimulationsandcomparethemwiththosepredictedbythemathematicalmodels. Finally,ourconcludingremarksaregivenintheDiscussionsection. 2 Materials and Methods 2.1 The evolutionarymodel WebasicallyusethesameevolutionaryprogrammingmodelbasedonTuringMachinesthathasbeenintroduced in[1]. Thefollowingaretheonlydifferencesbetweenthatmodelandthemodelweuseinthisarticle: 1. theTMs’movableheadcanmoveonlyrightorleft,nowitcannotstaystill(thisalsoaffectsthedefinitionof theaddedstate); 2. theTMs’tapeisnowcircular,sothattheTMheadcannotexitfromthetape; Thefirstchoiceallowsustosaveonebitofmemoryforeachstate oftheTMsand,atthesametime, makesour definitionmoresimilartotheoriginalone[15]. Thesecondchoiceseemstousthemostconvenientwhendealing withfinitetapes. To the sake of making this article self-contained, we give a terse description of Turing machinesand of the evolutionaryprogrammingmodelthatweuse. 2 Turing Machines are very simple symbol-manipulating devices which can be used to encode any feasible algorithm. Theywereinventedin1936byAlanTuring[15]andusedasabstracttoolstoinvestigatetheproblem offunctionscomputability.Foracompletetreatmentofthissubjectwereferto[16]. ATuringmachineconsistsofamovableheadactingonaninfinitetapeT(t),seefigure1. Thetapeconsistsof discretecellsthatcancontaina0ora1symbol. Theheadhasafinitenumberofinternalstatesthatwedenoteby N(inwhichcasetheTMiscalledanN-stateTM).Atanytimettheheadisinagiveninternalstates(t)anditis locateduponasinglecellk(t)oftheinfinitetapeT(t). Itreadsthesymbolstoredinsidethecelland,accordingto itsinternalstateandthesymbolread,performsthreeactions: 1. “write”:writesanewsymbolonthek(t)cell(T(t) T(t+1)), 7→ 2. “move”:movesonecellontherightorontheleft(k(t) k(t+1)), 7→ 3. “call”: changesitsinternalstatetoanewstate(s(t) s(t+1)). 7→ Accordingly,astatecanbespecifiedbytwotriplets“write-move-call”listingtheactionstoundertakeafterreading respectivelya0or1symbol.Thereexistsadistinguishedstate(theHaltstate)thatstopsthemachinewhencalled. The initial tape T(0) is the input tape of the TM, and the tape T(t¯) at the instant t¯when the machine stops is its output tape, that is the result of executing the algorithm defined by the given TM on the input tape T(0). However,manyTMswillneverstop,sothattheywillnotbeassociatedwithanyalgorithm.Moreover,thehalting problem,thatistheproblemofestablishingifaTMwilleventuallystopwhenprovidedwithagiveninputtape,is undecidable. ThismeansthattherewillexistTMsforwhichitisimpossibletopredictiftheywilleventuallyhalt ornotforagiveninputtape. WehavetointroducesomerestrictionsonthedefinitionoftheTMsinourevolutionarymodel.Sincewewant toperformcomputersimulations,weneedtouseatapeoffinitelengththatwefixto300cells. Thepositionofthe headistakenmodulothelengthofthe tape,thatisweconsidera circulartapewithcell1comingnextcell300. Sinceitisquiteeasytogeneratemachinesthatrunforever,wealsoneedtofixamaximumnumberoftimesteps, thereforewechoosetoforcehaltingthemachineifitreaches4000steps. Webeginwithapopulationof3001-stateTMsofthefollowingform 1 0 0 move1 Halt (1) − − 1 1 move2 Halt − − wheremove1andmove2arefixedatrandomasRightorLeft,andletthemevolvefor50000generations.Ateach generationeveryTMundergoesthefollowingthreeprocesses(inthisorder): 1. states-increase, 2. mutation, 3. selectionandreproduction. States-increase. Inthisphase,furtherstatesareaddedtotheTMwitharatep. Thenewstatesarethesameas i (1) with the 1 labelreplacedbyN+1, N beingthe numberof statesbeforethe addition. While it is clear that thestates-increaseshouldbeconsideredaformofmutation(vaguelyresemblinginsertion),wepreferredtokeep itdistinguishedbecauseitseffectisalwaysneutral. Mutation. Duringmutation,allentriesofeachstateoftheTMarerandomlychangedwithprobabilityp . The m newentryisrandomlychosenamongallcorrespondingpermittedvaluesexcludingtheoriginalone.Thepermitted valuesare: 0or1forthe“write”entries; • Right,Leftforthe“move”entries; • TheHaltstateoranintegerfrom1tothenumberofstatesNofthemachineforthe“call”entries. • 3 Selectionandreproduction. Intheselectionandreproductionphaseanewpopulationiscreatedfromtheactual one(oldpopulation).ThenumberofoffspringofaTMisdeterminedbyits“performance”and,toaminorextent, bychance. Actually, in the field of evolutionaryprogrammingthe wordused is fitness. However, in population genetics,fitnessisusedtodenotetheexpectednumberofoffspring(orthefractionthatreachthereproductiveage) of an individual. To avoid ambiguities, we decided to reserve the word fitness for this meaning, and to use the wordperformancefortheevolutionaryprogrammingone. TheperformanceofaTMisafunctionthatmeasures howwelltheoutputtapeofthemachinereproducesagiven“goal”tapestartingfromaprescribedinputtape. We computeitinthefollowingway. Theperformanceisinitiallysettozero. Thentheoutputtapeandthegoaltape arecomparedcellbycell. Theperformanceisincreasedbyoneforany1ontheoutputtapethathasamatching1 onthegoaltapeanditisdecreasedby3forany1ontheoutputtapethatmatchesa0onthegoaltape. Asaselectionprocess,weusewhatinthefieldofevolutionaryalgorithmsisknownas“tournamentselection ofsize2withoutreplacement”.NamelytwoTMsarerandomlyextractedfromtheoldpopulation,theyrunonthe inputtapeandaperformancevalueisassignedtothemaccordingtotheiroutputtapes.Theperformancevaluesare comparedandthemachinewhichscoreshighercreatestwocopiesofitselfinthenewpopulation,whiletheother one is eliminated (asexual reproduction). If the performancevalues are equal, each TM creates a copy of itself inthenewpopulation. ThetwoTMsthatwerechosenforthetournamentareeliminatedfromtheoldpopulation (namely they are not replaced) and the process restarts until the exhaustion of the old population. Notice that this selection procedure keeps the total population size N (with N an even number) constant. From our point of view this selection mechanism has two main advantages: it is computationally fast and quite simple to treat mathematically. ThechoiceofTMstoencodethealgorithmsinourevolutionarymodelwasconvenientforvariousreasons.The firstreasonisthatanyfeasiblealgorithmcanbeencodedthroughaTM(Church-Turingthesis[16]);sothatTMs areuniversalobjectsinsidethealgorithmsclass. ThesecondreasonisthatevenTMswithaverylownumberof statescanexhibitaverycomplicatedandunpredictablebehaviour(eveniftheinputtapeisfilledonlywithzeroes asinourcase,seeforexamplethebusybeaverfunction[17]). Thankstothisproperty,itisverydifficulttopredict thedynamicsofourevolutionarymodel. Whiledevelopingthemodel,wewereprimarilyinterestedinhowthevariationsinthelengthofthecodeaffect theevolutionarydynamics. Fromthispointofview,theTMspresentmanyadvantages. Thedistinctionbetween codingand non-codingtriplets is unambiguousand veryeasy to verify. We define a triplet as non-coding(with respecttoagiveninputtape)ifmutationsinitsentriescannotaffecttheoutputtapeoftheTMandwewillcallit codinginthecomplementarycase. Thisdefinitionispracticallyequivalenttosayingthatatripletiscodingifitis executedatleastonetimewhentheTMrunsonagiveninputtapeanditisnon-codingifitisneverexecuted. In thisway,toidentifycodingtriplets,onehasonlytoruntheTMontheprescribedinputtapeandmarkthetriplets thatareexecuted.Anotheradvantageisthatthemechanismofstate-addingiscompletelyneutral;theaddedstates arealwaysnon-coding,sothattheycannotchangetheperformanceoftheTM.Ontheotherhand,thereisasimple mechanism of code activation. Namely, a triplet of a non-codingstate s can be activated, for example, when a mutation occurs in the call entry of a coding state changing its value to s, but notice that also mutations in the writeandmoveentries(ofacodingtriplet),canresultinanactivationorinactivationoftheTMstriplets. Finally, anotheradvantageofusingTMsisthattheyarespecifiedintermsofanatomicinstruction:thestate. 2.2 The Eigen’s error threshold Theerrorthresholdconceptwasintroducedin1971byEigeninthecontextofitsquasispeciesmodel[14],[18]. The model describesthe dynamicsof a populationof self-replicatingpolynucleotidesof fixed length L, subject tomutationandundertheconstraintofconstantpopulationsize. EachpolynucleotideI(i) ischaracterizedbyits replicationrateA ,itsdegradationrateD andtheprobabilitiesQ ofmutatingintoadifferentpolynucleotideI(j) i i ji asaconsequenceofaninexactreplication. Alltheseparametersareassumedtobefixednumbers,independentof timeandofpopulationcomposition.TheEigenmodelthenconsistsofasetofODEsdeterminingtheevolutionof thefrequencyφ ofthepolynucleotidesI(i) inthetotalpopulation: i φ˙ = (A Q D δ )φ φ (A D )φ , (2) i j ij j ij j i j j j − − − j j X X wherethesum isoverall possiblepolynucleotidetemplatesI(j). Itis supposedthatthepolynucleotideI(1) has a largerfitness than the othersA D > A D , k > 1. Such polynucleotideis usually called the master 1 1 k k − − sequence while the others are called mutants. If we assume that mutation is exclusively due to point mutation, we neglecttransversions, suppose that transitionshave all the same probabilitiesof occurringand that the point 4 mutationprobabilityisindependentonthesite,thenwecanidentifyourpolynucleotidesasbinarychainsoflength LandthemutationprobabilitiesQ dependonlyonthepointmutationprobabilityq andtheHammingdistance ji d(i,j)amongthebinarychainI(i) andthebinarychainI(j): Q =qd(i,j)(1 q)L d(i,j) ji − − OnceassignedtheA andD parameters,onecanstudytheasymptoticcompositionofthepopulationasafunction j j ofthepointmutationprobabilityq. Itturnsoutthat(atleastforsomechoicesofthe fitnesslandscape,see [19], [20],[21],[22],[23])thereisasharptransitioninthepopulationcompositionnearaparticularvalueofq thatis termederrorthreshold. Beforetheerrorthreshold,thepopulationisorganizedasacloudofmutantssurrounding the master sequence, while, after the error threshold, each polynucleotideis almost equally represented. In the thermodynamiclimit (when the chain length L goes to infinity and the point mutation q goes to zero in such a way that the genomic mutation rate p = qL stays finite) this is a real phase transition of first order [24], and the error threshold is mathematically well defined. As a consequence, from this model it follows that natural selectioncanpreservethegenomeinformativecontentonlyifthemutationrate islowerthantheerrorthreshold (see [25]); after the error threshold, all the information content is lost. For a single peak fitness landscape (i.e. A D =A D , k >2)andinthethermodynamiclimit,thesystemofequations(2)canbedecoupledintoa k k 2 2 − − twobytwosystembyintroducingacollectivevariableφ fortheoverallfrequencyofmutantsinthepopulation M ∞ φ = φ M k k=2 X Inthethermodynamiclimit,thefidelityrateofthemastersequencewillbegivenbyQ =e pandtheprobability 11 − ofbackmutationQ willgotozero. So,theEigenequationstaketheform: 1M φ˙ = A e p D φ φ [(A D )φ +(A D )φ ], (3) 1 1 − 1 1 1 1 1 1 2 2 M − − − − φ˙M = A(cid:0)1 1 e−p (cid:1)φ1+(A2 D2)φM φM[(A1 D1)φ1+(A2 D2)φM]. (4) − − − − − TheerrorthresholdP¯,inth(cid:0)iscase,w(cid:1)illcoincidewiththelowestvalueofthemutationprobabilityP = 1 e p − − forwhichthemastersequencegoesextinctintheasymptoticlimitt [26]. Usingtheconstraintφ +φ =1 1 M →∞ inequation(3)wegetaclosedequationforφ thatgives: 1 (A D ) (A D ) P¯ = 1− 1 − 2− 2 . (5) A 1 Observethatthe infinite populationlimit hasthe effectof removingthe geneticdriftandthatthe survivalof themastersequenceintheasymptoticlimitforvaluesofthemutationprobabilitylessthantheerrorthresholdis possible only for infinite populations. For finite populations, when the probability of reverse mutations is zero, thegeneticdriftwillalwayspushthepopulationinitsonlyabsorbingstate: theextinctionofthemastersequence. In the finite population case, however, the expected number of generations before the extinction of the master sequencewill startto growby severalordersof magnitudewhen the mutationprobabilitydropsbelowthe error threshold(see [27], [28]). Thisis the reasonwhythe valueof the errorthresholdpredictedby the deterministic modelworksalsoforfinitepopulations.Thiseffectisalsopresentinourmodel(seethenexttwosectionsandfigure 12). 2.3 The deterministicmodel In this section we will describe a deterministic mutation-selectionmodelwith tournamentselection of rank two andwe will obtainthe correspondingerrorthreshold. Our modelof selection and reproductionis verydifferent fromthatoftheEigenmodel. Inparticular,thenumberofoffspringisnotconstantforeachgenotype,sincewhile theperformancelandscapeisfixed,thefitnesslandscapechangesintimefollowingthechangesinthepopulation composition.Despitethisfact,whenoneneglectstheprobabilityofbackmutations,oneobtainsaclosedequation forthe numberofindividualswith thebestperformance,as ithappensfortheEigenmodelwiththe singlepeak fitnesslandscape. FollowingtheexampleoftheEigenmodel(seealso[29]),wewilldefinetheerrorthresholdas thevalueofthemutationprobabilitythatcausestheextinctionofthemastersequence(that,inourcase,isthebest performanceclass)forourselectionmodelconsideredinthedeterministiclimit. LetussupposethatwehaveM possibleperformanceclasses,apopulationofsizeN,andletusdenotewithn i thenumberofindividualsbelongingtotheithperformanceclass. Intheselectionstepwedraw2individualsfrom 5 thepopulationwithoutreplacementandcomparetheirperformances. Theindividualwith higherperformanceis copiedintothenewpopulationandhasaprobabilityf togiveraisetoanothercopy,while,withprobability1 f − the second copy will belong to the individual with the lower performance. When the two individuals have the sameperformance,thenbothare passedtothe newpopulation. Thetwoindividualsareeliminatedfromtheold populationandtheprocessisrestarteduntiltheoldpopulationisexhaustedandthenewpopulationisreplenished. Notice that it must hold 0 f 1. The selection mechanism we used in our TMs model correspond to the ≤ ≤ particularchoicef =1. Withthismechanism,eachindividualbelongington hasaprobability i n j<i j P =f 2 N 1 P − ofmakingtwocopiesofitself,aprobability 1 P = (1 f) n +n 1+(1 f) n 1 j i j N 1 − − −  − j<i j>i X X   ofmakingonecopyofitself,and,finally,aprobability f P = n 0 j N 1 − j>i X ofmakingnocopyatall. Itfollowsthattheexpectednumbern ofindividualsintheithperformanceclassafterselectionisgivenby: ′i f n =n 1+ n n (6) ′i i N 1 j − j − j<i j>i X X    Noticethatitholds: M n =N. ′i i=1 X Now, let us consider the mutation step. We assume that the individuals in each performance class i share the same probabilityQ of undergoingneutralmutationsonly,or no mutationsatall. We willcallQ , with a slight i i abuse of terminology, the fidelity rate of the ith performance class. Let us denote with g the probability that ij anindividualin the jth performanceclass givesraise to an individualin the ith performanceclass as a resultof a mutation (g = 0 since we includedthe neutralmutationsin the fidelity rate Q . Obviously, we coulddefine ii i Q astheprobabilityofundergoingnomutationsatallandg astheprobabilityfortheinterveningmutationsof i ii beingneutral. However, even if the alternativechosen in the textcould seem clumsier it is better suited for our mathematicalanalysis.).Thismutationmechanismgivesraisetothefollowingdeterministicdiscreteequation: M n =n Q + (1 Q )n g . (7) ′i′ ′i i − j ′j ij j=1 X Noticethat,sincebydefinition, M g =1, ij i=1 X itwillalsohold M n′i′ =N. i=1 X Supposenowthatg 1ifi > j,namelythattheprobabilityofamutationtoahigherperformanceclassis ij ≪ verysmall,thenthefractionofindividualsundergoingabeneficialmutationinonegenerationisnegligible. Lets bethebestoccupiedperformanceclassatagiventimen > 0,n = 0, i > s,andsuppose1 < s < M. From s i equation(6),wegetthatitalsoholdsn >0,n =0, i>s. Then,from(7)and(6)weget: ′s ′i f f(N n ) s n′s′ =n′sQs =nsQs1+ N 1 nj=nsQs 1+ N −1 (8) − j<s (cid:20) − (cid:21) X    6 The best performance class is stably populated if n = n . We have two solutions. The first one is given by ′s′ s n(1) =0andthesecondby s 1 N 1 n(2) = N(1+f) 1 − . (9) s f − − Q (cid:18) s (cid:19) n(2)isgreaterthanzeroif s 1 1 1 Q > = +O s 1+f N 1+f N N 1 (cid:18) (cid:19) − andinsuchacasen = n(2) isasinkandn = 0anunstableequilibrium,sincethefunctionn n ispositive s s s ′s′− s forn (0,n(2))andnegativeforn (n(2),N),asshowninfigure2. If s s s s ∈ ∈ 1 Q < , s 1+f N N 1 − thenthereisonlyasinkinn =0. Hence,theerrorthresholdisgivenby s 1 Q¯ =1 P¯ = (10) − 1+f N N 1 − NeglectingtheO(1/N)corrections,weobtain: f P¯ = (11) 1+f ThisisthesameresultthatonegetsfromtheEigenmodelwhenconsideringthesinglepeakfitnesslandscapewith A =(1+f)(A D +D )(seeequation(5)andalso[26]). 1 2 2 1 − With the previousargumentwe haveshownthat, after infinitelymanygenerations,the bestoccupiedperfor- manceclassmustsatisfyQ >Q¯ namelythatn =0forallj suchthatQ <Q¯. Wecannowshowthatactually s j j theindexs isactuallythe largestpossibleone, namelythatthereisnoclassisuchthatQ > Q > Q¯. Indeed, s i let us suppose that g = 0 i = 1,...,M 1, then if at a given generation, the ith performanceclass is i+1,i 6 ∀ − populated,whilethei+1thisempty,thenatthefollowinggenerationwewillhave f n = (1 Q )g n 1+ n n ′i′+1 − l i+1,l l N 1 j − j l i − j<l j>l X≤ X X    f (1 Q )g n 1+ n >0 (12) i i+1,i i j ≥ −  N 1  − j<i X   So, a fractionof the population(possiblyverysmall) will filtrate progressivelyinto higherperformanceclasses. ThisprocesswillcontinueuntilthelastperformanceclassM oraperformanceclassssuchthatQ >Q¯,Q <Q¯ s i if i > s willbe reached. Thentheasymptoticoccupationnumberof thisclass willbe givenbyequation(8). A certainnumberofobservationsaboutthisresultareinorder.First,letusnoticethataccordingtoequation(12),the s+1thperformanceclasswillbepopulatedateachgenerationbymutantsofthesthone. Aswesaid,ifg is s+1,s small,thisnumberwillbeatinyfractionofn andwecanneglectit(aswedidinequation(8)).So,whatwehave s reallyshownisthatthesthperformanceclassisthelastonethatwillhaveasignificativeoccupationnumber. The actualvalueofthisnumberwilldependmainlyonhownearQ istoQ¯. s The second argument to keep into account is that the time to populate the sth class could be astronomical and will depend on the values of g , Q and f. In particular, to keep it reasonable, g and f must not be ij i i+1,i exceedinglysmall. Itisalsonecessarythatfori < sthefidelityratesQ arenotsmallerthanortooneartoQ¯. A i naturalassumptionavoidingthisoccurrenceisthatQ isamonotonicallydecreasingfunctionofi. i Asanillustrativeexample,weshowinfigure3theresultsofanumericalsimulationofthediscretesystem(6), (7)withthefollowingchoicesoftheparameters:M =40,N =100,f =10 3, − 1 10 6, ifi=j =1orj i=1 − − − g = 10 6, ifi=j =40ori j =1 ij −  − 0, otherwise,   7 Q =(1 10 5)√i3, i=1,...,40. (13) i − − andwithallthe100individualsinthefirstperformanceclassasaninitialstate. Needlesstosaythatthischoiceof theparametersdoesnotpretendtohaveanydegreeofbiologicalrealism. Thegreenpointsshowtheoccupation numbersofthe40performanceclassesobtainedafter2 106generations,whiletheredlineconnectthoseobtained · after106generations.Themaximumdifferencebetweentheoccupationnumbersofthesameperformanceclassat 2 106and106generationsis5 10 5and,consequently,cannotbedetected. Thismeansthatthepopulationhad − · · almostreacheditsstablestateafter106 generations. Theerrorcatastrophedoesoccurwhenthefidelityrate(13) islessthanthefidelitythreshold(10). Withtheabovechoices,wehaveQ >Q¯ fori=1,...,21andQ <Q¯ for i i i=22,...,40. Accordingtotheabove,weexpectthat,intheasymptoticlimit,theperformanceclassesfromthe 22ndon,shouldbeempty,whileequation(9)predictsthatthe21stoneshouldbeoccupiedby4.682individuals.At theendofthesimulation,thenumberofindividualsinthe21stperformanceclassis4.686whilethoseinthe22nd are 2.02 10 4 and they go progressivelydecreasing, by approximately5 ordersof magnitudeper performance − · class,whiletheperformanceclassincreases. The TMs critical number of coding states We made two hypotheses in our deterministic model to find the valueof the error threshold(11). The first hypothesisis thatg 1 if i > j, that is, that favorable mutations ij ≪ are extremely rare. This is a natural assumption in our TMs model, because very often the mutations induce a bigchangeintheoutputtapethathaveaverysmallprobabilityofbeingfavorable. Moreover,inthenextsection, we will develop a stochastic model based on the same assumption, and we will see (figure (12)) that there is goodagreementbetweenthepredictionofthismodelandtheobservedresults. Thesecondrelevantassumptionto computethe errorthreshold(11) isthatthe individualsbelongingto the bestperformanceclass s have thesame fidelityrateQ . Wewillmakethefurtherassumptionthat,forTMs,theprobabilitythatamutationoccurringina s codingtripletisneutralisalsonegligible.Then,thefidelityrateofaTMwithN codingtripletsisgivenby c Q=(1 p )3Nc, (14) m − since mutations occurringin non-codingtriplets are, by definition, neutral. It follows that, for a given value of p , the fidelity rate is determinedonly by the numberN of codingtripletsof the TM. The assumptionthat the m c bestperformingTMshavethesamefidelityrateisthereforeequivalenttotheassumptionthattheyhavethesame number of coding triplets. Figure 4 shows that this assumption is very near to the truth when considered for a given run (that is what we really need). However, notice that the relation among s and Q varies considerably s amongdifferentruns. Thisisparticularlyevidentinfigure5,wherethenumberofcodingtripletsassociatedwith theperformancescoresof47and48exhibitsamorethantwo-foldvariation. Havingestablishedthatthehypothesesunderwhichwehaveobtainedtheerrorthreshold(11)areaccuratefor ourmodel,wecanuseittodeterminethemaximumallowednumberofcodingtripletsforaTM.Inthecaseofour model,f = 1, so thatequation(11)giveustheerrorthresholdatP¯ = 1/2. Themutationprobabilityfora TM withN codingtripletsisgivenby: c P =1 (1 p )3Nc. (15) m − − Byequating(15)totheerrorthreshold,wegetthecriticalnumberofcodingstatesfortheTMs: ln(2) Nc∗ =−3ln(1 p ) (16) m − Thisexpressionisrepresentedbythethickblacklineinfigure6. TheultimatefateofTMswithanumberofcoding stateslargerthanN ,accordingtoourdeterministicmodel,willbetheextinction. c∗ 2.4 The stochasticmodel Inthissectionwewillkeepintoaccountthestochasticeffectsinourmutationandselectionprocedures. WerecallthattheconstantpopulationsizeN mustbeanevennumber. Wewillintroduceastochasticmodel fortheevolutionofonlythenumbern ofindividualswiththebestperformancevalue.Letusconsiderseparately s theselectionandmutationsteps. Theselectionstep Sinceweareinterestedintheevolutionofthenumbern ofthebestindividualsonly,wecan s putalltheremainingN n individualsintothesameclass. Letusdenotewiththesymbol“1”theindividualsof s − thebestperformanceclassandwiththesymbol“0”alltheothers.Wewilldenotebyn thenumberofindividuals ′s 8 inthehighestperformanceclassinthenewpopulation. n willbedeterminedbythenumberofpairs11,10and ′s 00thatwewillgetextractingrandompairswithoutreplacementfromtheoldpopulation. Letusdenotebyk the numberof11pairs,bylthenumberof10pairsandbymthenumberof00pairs. Asaconsequencewewillhave n =2k+l n =2(k+l) 2(k+l+m)=N (17) s ′s The probabilitythat we get n = 2(k +l) individualsinto the best class when applying the selection step to a ′s populationwithn =2k+lindividualsintothebestclass,isgivenbytheprobabilitythatweextractk11pairs,l s 10pairsandm00pairsfromasetcontaining2k+lonesandl+2mzeroes. Thisprobabilityisgivenby: k l m k+l+m l+m 2(k+l+m) P((11)...(11)(10)...(10)(00)...(00))=2l k l 2k+l (cid:18) (cid:19)(cid:18) (cid:19)(cid:30)(cid:18) (cid:19) z }| {z }| {z }| { Indeed,the2ltermkeepsintoaccountthatthelpairs10canbeobtainedextractingthe1beforethe0orviceversa. Theterm k+l+m k (cid:18) (cid:19) givesthenumberofpossibledistributionsofthek11pairsinsidethek+l+mtotalpairs. Theterm l+m l (cid:18) (cid:19) givesthenumberofpossibledistributionsofthel10pairsinsidetheremainingl+mpairs. Finally, 2(k+l+m) 2k+l (cid:18) (cid:19) isthenumberofpossibledistributionsofthe2k+l1symbolsinthe2(k+l+m)=N possibleplaces. Letusnoticethat n isalwayseven, • ′s n n , • ′s ≥ s n 2n . • ′s ≤ s Ifwefixn andn satisfyingtheaboveconstraints,wecanobtainkandlasafunctionofn andn : s ′s s ′s 2n n k = s2− ′s l=n′s−ns SincethetotalpopulationisfixedtoN wehave: N n 2(k+l+m)=N = m= − ′s ⇒ 2 Hence,theprobabilityofgettingn individualsintothebestperformanceclassafterapplyingtheselectionproce- ′s duretoapopulationwithn individualsintothebestperformanceclassis: s 0 if n <n orn >min(2n ,N) ′s s ′s s Prip(ns →n′s)= 02(n′s−nifs)n ′so2dndsN22−n′s !(cid:18) Nn−′s2−n2s+nns′s (cid:19)(cid:30)(cid:18) nNs (cid:19) otherwise.  Themutationstep Letusintroducemutationintothemodel.Wewillfollowtousethetwosimplifyingassump- tionsthatweusedforthedeterministicmodel,namely: 1. TMsinthebestperformanceclasshavethesamenumberofcodingtripletsN . c 2. Mutationsincodingtripletsare(almost)alwaysdeleterious. 9 Bydefinition,mutationsinnon-codingtripletsareneutral. Under this assumptions if n is the number of best individuals before mutation, the probability of getting ′s n =n kindividualsafterthemutationstepisgivenby ′s′ ′s− Pmut(n′s →n′s′ =n′s−k)= nk′s Pk(1−P)n′s−k, (cid:18) (cid:19) wherewedenotedbyP theprobabilitythatanindividualinthebestperformanceclasswillundergoatleastone mutationintoacodingtriplet. P =1 (1 p )3Nc m − − The Markovmatrix If the totalpopulationN is finite, underour assumptionthe bestindividualswill always go extinct in a finite time. The expectednumber of generationsτ before it happenscan be computedusing the MarkovmatrixM oftheprocess[30]. TheentriesM oftheMarkovmatrixgivetheprobabilitythatthesystem ij underscrutinypassfromitsithstatetothejthone. Inourcasethestateofthesystemislabeledbythenumbern s ofindividualsintothebestperformanceclassandtheentriesofM willbegivenby N Mns+1,n′s′+1 = Prip(ns →n′s)Pmut(n′s →n′s′), ns,n′s′ =0,...,N nX′s=0 Thestaten = 0willbeanabsorbingstate forM andtheproceduretocomputetheexpectednumberofgener- ′s′ ationsτ forreachingit,worksasfollows. LetS bethematrixthatoneobtainsbyremovingthefirstrowandthe firstcolumncorrespondingtotheonlyabsorbingstateandletcbeaN dimensionalvectorwhoseentriesareall one. ThematrixI S,whereIdenotestheidentitymatrix,isinvertible−.IftheMarkovprocessbeginsinthestate − i,thentheexpectednumberofgenerationsbeforeextinctionwillbegivenby: τ = (I S) 1c . (18) − − i Let us stress that the equation (18) is obtained by(cid:2)assuming th(cid:3)at the system evolves for an infinite number of generations. The expected extinction times versus the number of coding triplets are plotted in figure 12 for 5 differentvaluesofthemutationprobability. 2.5 Simulationssettings Inthissubsectionweintroducetheparametervaluesthatweadoptedinourcomputersimulations. We chose the goal tape containing the binary expression of the decimal part of π (the dots are just a useful separator): 0010010000.1111110110.1010100010.0010000101.1010001100.0010001101.0011000100.1100011001. 1000101000.1011100000.0011011100.0001110011.0100010010.1001000000.1001001110.0000100010. 0010100110.0111110011.0001110100.0000001000.0010111011.1110101001.1000111011.0001001110. 0110110010.0010010100.0101001010.0000100001.1110011000.1110001101. Asaconsequence,themaximumpossibleperformancevalueis125.Weperformedsimulationswiththefollowing (approximate)valuesofthestates-increaseratep andpointmutationprobabilityp : i m pi 9.26 10−5;1.66 10−4;3.00 10−4;5.40 10−4;9.72 10−4;1.75 10−3; ∈ · · · · · · (cid:8)3.14 10−3;5.68 10−3;1.02 10−2;1.85 10−2;3.33 10−2;6.00 10−2; · · · · · · 1.08 10 1;1.95 10 1;3.51 10 1;6.33 10 1;1.14 . − − − − · · · · pm 4.91 10−5;8.10 10−5;1.34 10−4;2.21 10−4;3.64(cid:9) 10−4;6.01 10−4; ∈ · · · · · · (cid:8)9.91 10−4;1.64 10−3;2.70 10−3;4.44 10−3;7.35 10−3 . · · · · · Thesevalueshavebeenchoseninsuchawaythatconsecutiveoneshaveaconstantr(cid:9)atio. Foranypairofvalues p,p ,weperformed20simulationsvaryingtheinitialseedoftheCnativerandomnumbergenerator,foratotal i m of3740runs. Eachsimulationlasted50000generations. 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.