ORIGINAL ARTICLE doi:10.1111/j.1558-5646.2012.01595.x MULTISCALE MODEL OF CRISPR-INDUCED COEVOLUTIONARY DYNAMICS: DIVERSIFICATION AT THE INTERFACE OF LAMARCK AND DARWIN LaurenM.Childs,1 NicoleL.Held,2 MarkJ.Young,3 RachelJ.Whitaker,4,5 andJoshuaS.Weitz6,7 1SchoolofBiologyandSchoolofMathematics,GeorgiaInstituteofTechnology,310FerstDr,Atlanta,Georgia30332 2DepartmentofMicrobiology,UniversityofIllinoisatUrbana-Champaign,601S.GoodwinAvenue,Urbana,Illinois61801 3ThermalBiologyInstituteandDepartmentofPlantSciencesandPlantPathology,MontanaStateUniversity,307Plant BioscienceBuilding,Bozeman,Montana59717 4DepartmentofMicrobiologyandInstituteforGenomicBiology,UniversityofIllinoisatUrbana-Champaign,601 S.GoodwinAvenue,Urbana,Illinois61801 5E-mail:[email protected] 6SchoolofBiologyandSchoolofPhysics,GeorgiaInstituteofTechnology,310FerstDr,Atlanta,Georgia30332 7E-mail:[email protected] ReceivedJuly27,2011 AcceptedJanuary19,2012 DataArchived:Dryaddoi:10.5061/dryad.gc2260qm TheCRISPR(ClusteredRegularlyInterspacedShortPalindromicRepeats)systemisarecentlydiscoveredtypeofadaptiveimmune defenseinbacteriaandarchaeathatfunctionsviadirectedincorporationofviralandplasmidDNAintohostgenomes.Here,we introduceamultiscalemodelofdynamiccoevolutionbetweenhostsandvirusesinanecologicalcontextthatincorporatesCRISPR immunityprinciples.WeanalyzethemodeltotestwhetherandhowCRISPRimmunityinduceshostandviraldiversificationand themaintenanceofmanycoexistingstrains.Weshowthathostsandvirusescoevolvetoformhighlydiversecommunities.We observethepunctuatedreplacementofexistentstrains,suchthatpopulationshaveverylowsimilaritycomparedoverthelong term.However,intheshortterm,weobserveevolutionarydynamicsconsistentwithbothincompleteselectivesweepsofnovel strains(assinglestrainsandcoalitions)andtherecurrenceofpreviouslyrarestrains.Coalitionsofmultipledominanthoststrains arepredictedtoarisebecausehoststrainscanhavenearlyidenticalimmunephenotypesmediatedbyCRISPRdefensealbeitwith differentgenotypes.Weclosebydiscussinghowourexpliciteco-evolutionarymodelofCRISPRimmunitycanhelpguideefforts tounderstandthedriversofdiversityseeninmicrobialcommunitieswhereCRISPRsystemsareactive. KEY WORDS: Evolutionarybiology, host–parasiteinteractions, immunedefense, microbialecology, viralevolution. TheCRISPR(ClusteredRegularlyInterspacedShortPalindromic example, plasmids and viruses (Mojica et al. 2005; Barrangou Repeats) system is a recently discovered type of adaptive im- etal.2007;Brounsetal.2008;Deveauetal.2008;Horvathand munesystemwhichdefendsagainstforeigngeneticmaterial,for Barrangou2010).Importantly,theCRISPRsystemispurported tobethemeansbywhichsomebacteriaandarchaeaevadeviral infection and lysis in the environment (Andersson and Banfield Re-useofthisarticleispermittedinaccordancewiththeTermsand 2008;Heldetal.2010;Heidelbergetal.2009).Aswedescribe Conditions set out at http://wileyonlinelibrary.com/onlineopen# OnlineOpenTerms below,themoleculardetailsofhowtheCRISPRsystemoperates (cid:2)C 2012TheAuthor(s).Evolution(cid:2)C 2012TheSocietyfortheStudyofEvolution. 2015 Evolution66-7:2015–2029 LAUREN M. CHILDS ET AL. andhowvirusesevadeitaretopicsofintensivestudy.Nonethe- rulesofgenomicchangeassociatedwiththeCRISPRsystem.We less,thefactthathostswithanoperativeCRISPRsystemundergo dosotofurthertheoreticalunderstandingoftwoquestions.First, directedchangestotheirgenomewithrespecttotheintroduction arethemolecularmechanismsassociatedwiththeCRISPRsys- offoreigngeneticmaterialposesachallengetotheoreticalefforts temsufficienttoleadtoandmaintainviralandhostdiversityand tounderstandthebasisforcoevolutionary-induceddiversification acomplexhost–viralcommunity(AnderssonandBanfield2008; amonghostsandviruses.Nearlyalltheoriesofevolutionarydy- Heidelberg et al. 2009; Held et al. 2010)? Second, what are the namicshaveincommontwotenetsofDarwinianevolution:first, evolutionarymechanismsbywhichdirectedandundirectedmuta- changestoorganismalgenomes,forexample,mutations,areran- tionalmechanismsremaininbalance,incaseswherecoexistence dom(LuriaandDelbruck1943;LederbergandLederberg1953); isobserved(Heidelbergetal.2009;Heldetal.2010)? second,successoforganismsdependsontheirecologicalfitness A few other models have already made inroads in charac- (Lande1976;Geritzetal.1997;NowakandSigmund2004).The terizingtheeffectthatCRISPRdefensemayhaveonecological CRISPR system suggests that a new class of models are neces- and evolutionary dynamics. First, He and Deem (2010) utilized sarytodescribehost–viruscoevolutionthatliesattheinterfaceof an immunological-based approach in which viral production is DarwinianandLamarckianevolution(KooninandWolf2009). uncoupled from host density (and hence is less concerned with As noted above, the CRISPR system utilizes a form of ecologicallydrivendynamics).Thatmodelconcludedthatspacers genome-level imitation that permits a microbial cell to direct should be more diverse in the leading edge and also, that coex- genomicchangesthatmaybebeneficialtoitssurvivalagainstin- istence is possible among diverse strains. Second, Levin (2010) vadingelements(e.g.,Soreketal.2008;HorvathandBarrangou largely avoided the issue of coevolution, to examine ecologi- 2010; Marraffini and Sontheimer 2010a; Vale and Little 2010). calcompetitionbetweenstrainsthatpossessCRISPRimmunity CRISPRlocihavebeenidentifiedin40%ofbacteriaand90%of versus those that possess receptor-based immunity. The present archaea(Grissaetal.2007).Inbrief,theCRISPRsystemworks model aims to unite these two perspectives: (1) by utilizing an asfollows:bacteriaandarchaeamayhavemultipleCRISPRloci, explicit density-dependent ecological formalism for host–viral containingasetofCRISPR-associated(Cas)genesandarepeat- interactions,suchasLevin(2010);(2)byexaminingthecoupling spacerregion(Soreketal.2008;Jansenetal.2002;Makarovaetal. between the ecological dynamics of strains and the evolution- 2009). This region has “spacers,” that is, genetic subsequences arychangeofthegenomicstateofstrains,suchasHeandDeem usually 20–50 nucleotides long that match to the protospacers (2010).Insodoing,thepresentmodeltracksthedynamicsofboth found in extrachromosomal elements such as viruses, plasmids, hostandviralstrainstatesaswellasdensities.Athirdmodel,by and transposons, which are separated by repeats (Bolotin et al. Haerteretal.(2011),alsopresentsasimilarapproach,albeitwith 2005; Pourcel et al. 2005; Marraffini and Sontheimer 2010a). a focus on spatially-mediated interactions between viruses and Therepeat-spacerregionsaretranscribedasRNA.Mediatedby hosts. theCasproteins,theseCRISPRRNAsconferimmunityagainst Here,weanalyzecoevolutionary-induceddynamicswherein virusesandplasmidsbytargetinghomologousstretchesofDNA hostspossessmultiplespacersandvirusespossessmultipleproto- and/or RNA. Successful recognition of foreign genetic material spacers.Weobservethathighlydiverseassemblagesemergefrom (i.e., via Watson–Crick base pairing with specific subsequences lowdiversityinitialconditions.Theemergenceandmaintenance known as protospacers) can lead to repression and/or digestion ofdiversityisduetoaseriesofinvasionsbyvirusesandhosts.Di- oftheforeigngeneticmaterial(Haleetal.2009;Marraffiniand versityismaintainedoverthelongterm,butthisdiversityreflects Sontheimer2010a).However,theprecisemolecularmechanisms thepunctuatedemergenceofnovelhostandviralstrainsthathave for immunity or interference and acquisition of new spacers re- relativelyshortlifetimes.Hence,wefindthatpopulationsareof- mainsafocusofcontinuedresearch(e.g.,Haurwitzetal.2010; tenhighlysimilaronshorttime-scalesbuthighlydissimilarover MarraffiniandSontheimer2010b;Haleetal.2009;vanderOost longtime-scales.Wealsoobservethreetypesofevolutionarydy- etal.2009).Thehostgenomeevolvesbypartialimitationofviral namicsthatdriveshort-termchangesinourmodel:(1)invasion genomes (or plasmids) for which it has survived exposure (see by rare strains with fitness advantages; (2) recurrence of rare, Fig.1B).Incontrast,virusesthatinfectahostcellandavoidde- older strains that gain fitness advantages due to changes in the tectionbytheCRISPRsystemandotherviralimmunitysystems geneticstatesofotherstrains;(3)invasionbycoalitionsofstrains evolveviaundirectedmutation(seeFig.1A). with identical immune phenotypes but distinct genotypes. We Inthisarticle,weintroduceamodelintendedtocapturethe notethatcoevolutionarydrivendiversificationisnotinevitablein principlesofhost–virusinteractionsandcoevolutionviaCRISPR suchmodels,andpointoutconditionsthatfavorCRISPR-induced immunity in an explicit ecological context. The model utilizes eliminationofviruses,whichmaybeofinterestinbioengineering a multiscale approach to combine density-dependent ecological applications.Additionally,weobservethatCRISPRimmunityis dynamicswithevolutionarychangesinformedbythemolecular dominated by the most recently acquired spacers, an emergent 2016 EVOLUTION JULY2012 MULTISCALE MODEL OF CRISPR-INDUCED COEVOLUTIONARY DYNAMICS Figure 1. SchematicoftheDarwinianandLamarckiancomponentsofevolutionintheCRISPRmodel.(A)Undirectedmutationofviruses following successful infection leads to replacement with a novel protospacer within the viral genomes. New protospacers can occur anywhereintheprotospacerset.(B)Directedmutationofhostsleadstoinclusionofanovelspacerwithinthehostgenome.Newspacers areaddedattheleadingend.Note:Wesimplifythedynamicsofspacerstatechangebyassumingthemaximumnumberofspacersper straintypeisconstant.Whenthemaximumnumberisreached,theadditionofaspacerattheleadingendisaccompaniedbydeletion ofaspaceratthetrailingend. feature of our simulations. Hence, we predict that only the first Table 1. Descriptionandvaluesofparametersinmultiscaleeco- few spacers play a role in shaping the selective forces driving evolutionarysimulations.Detailsofhowtheseparametersarein- host–viralcoevolutionevenwhenthespacerlocusiscomprised tegratedinthemodelareexplainedinthemaintext. ofmanyspacers.Weshowthattheacquisitionrateofnewspacers Model isastrongerdeterminantofthecomplexityoftheresultingcom- component Parameter Meaning Values munitythanisthefailurerateofhoststoprotectagainstviruses forwhichCRISPRimmunityisalreadypresent.Finally,wedis- Molecular p CRISPRfailure 10−5 probability cusshowthiscoevolutionarymodelframeworkcanbeutilizedto q Newspacer 10−5 helpidentifythosefactorsdrivingCRISPR-inducedcoevolution acquisition intheenvironment. probability Ecological r Growthrate(1/h) 1 Models K Carryingcapacity 105.5 The coevolutionary model presented here is comprised of three (1/mL) parts:(1)ecological;(2)molecular;and(3)evolutionary.Thefull β Burstsize 50 φ Adsorptionrate 10−7 modelintegratesthesethreecomponentstogethertosimulatethe (mL/h) dynamicinteractionsbetweendiversehostsandviruses.Inbrief, m Viraldecayrate(1/h) 0.1 hostandviraldensitiesaredeterminedbyecologicalrulesofin- Evolutionary μ Mutationrate 5×10−7 teractionthatincludehostreproductionanddeath,viralinfection ρ Densitycutoff(1/mL) 0.1 c of hosts, and viral deactivation outside of hosts. The molecular componentdetermineswhetherviralinfectionleadstohostlysis, viraldeactivation,orspacerintegration.Theevolutionarycompo- et al. 2005). More broadly, multiscale eco-evolutionary models nentintroducesnewhostandviralstrainsandtheirgeneticstates ofthiskindhavebeenutilizedelsewhere,forexample,foodweb (see Fig. 1). The dynamical steady state in the model is shaped dynamics(LoeuilleandLoreau2005)andinfluenza-hostdisease bytheassumptionsbuiltintothemodelaswellasthequantitative dynamics(Koelleetal.2006). valuesofmodelparameters(seebelowandTable1).Thedetails ofthesecomponentsandofthecomputationalschemeusedtoim- ECOLOGICAL COMPONENT plementthemaredescribedbelow.Themodelframeworkbuilds Weconsideracommunitycomprisedofhosts(eitherbacteriaor uponanearlierefforttostudycoevolutionarydynamicsbetween archaea), viruses, and implicitly modeled resources. The densi- bacteriaandphagesbasedupontheevolutionofenveloperecep- tiesofhostsandviruseschangebasedonthefollowingecolog- torstateswithinbacteriaandtailfiberstateswithinphages(Weitz icalevents.First,hostscandividegivensufficientresourcesand EVOLUTION JULY2012 2017 LAUREN M. CHILDS ET AL. theycanalsodie.Here,wefocusonarathersimplifiedecolog- MOLECULAR COMPONENT OF CRISPR IMMUNITY icalcontext,whereresourcesareconsideredimplicitlyandhost Theimmunedefenseofahosttoviralinfectionisbasedonse- populations would increase to their carrying capacity in the ab- quence matches between host and viral genomes. The immune senceofviruses.Viralpopulationsincrease(andhostpopulations state of a host is denoted as S =(s ,s ,...,s ) where s is 1 2 u i decrease) due to infection and lysis of hosts. Viral populations theithspacerofu spacersintheCRISPRlocus.Inreality,mul- decrease due to spontaneous deactivation in the environment, a tiple CRISPR locimayexist within agivenhost,however,here process thought to be characterized by a single time-scale (De we only analyze a single locus. We simplify the dynamics of PaepeandTaddei2006).Viruspopulationsalsodecreasedueto spacerstatechangebyassumingthemaximumnumberofspac- unsuccessfulinfections.Wedenote N asthedensityofhostsof ersperstrainisconstantandthatspacersarealwaysaddedtothe i strainiandV asthedensityofvirusesofstrain j.Eachhoststrain leading end. When the maximum is reached, the addition of a j hasauniquegenomicstatewhichwedenotebyS,corresponding spacertotheleadingendisaccompaniedbydeletionofaspacer i tothesetofspacersitcontainsthatconferitwithCRISPR-derived atthetrailingend.Spacersaredrawnfromprotospacers,thatis, immunity.Eachviralstrainhasauniquegenomicstatewhichwe smallsubsequenceswithintheviralgenome.Assuch,wedenote denotebyG ,correspondingtothesetofprotospacersitcontains the genomic state of the virus relevant to CRISPR immunity as j forwhichhostsmayormaynotbeimmune.Hostsreproduceat G =(g1,g2,...,gv), where gj is the jth protospacer of v pro- a maximum per-capita rate ofr with a carrying capacity of K. tospacersinthevirus.Throughoutthisanalysis,weconsiderall i Virusesinfecthostsatarateφ .Here,weonlyconsidertwopos- undirectedmutationstobedrawnfromaninfinitenumberofal- ij sibleoutcomesforaviralinfection:(1)thehostdiesandnewviral leles. This assumption is supported by studies that suggest that particlesareproduced;(2)thehostdisablestheviralgenomeand only a single base pair mismatch undermines CRISPR immu- (possibly)modifiesitsowngenomeinadirectedfashion.Finally, nity(Barrangouetal.2007).Otherimmunitymodelsmayfollow virusesdecayatadensity-independentrateofm.Togetherthese fromrelaxingthiscondition.Here,CRISPRimmunityisdefined rulesleadtothefollowingdynamicalequations: asfollows: Hostdivisionandmortality (cid:2) (cid:3)(cid:4) (cid:5) (cid:15) ⎛ (cid:9) ⎞ 1 if s =g ∈G, ⎜ Ni⎟ M(S,G)= i j (3) dNi = riNi⎜⎝1− i ⎟⎠ 0 if otherwise. dt K Inwords, M(S,G)=1ifthespacersetinthehostincludes Virallysiswhennotimmune (cid:2) (cid:9)(cid:13) (cid:3)(cid:4) (cid:14) (cid:5) (1) at least one perfect match to a protospacer in the virus, oth- −(1−q) 1−M(Si,Gj) φijNiVj erwise M(S,G)=0. The CRISPR immune mechanism is not j perfect(Barrangouetal.2007).Weexpecterrorswillbeoftwo V(cid:2)irallysisw(cid:3)(cid:4)henimmun(cid:5)e types,falsenegativesandfalsepositives.Forfalsenegatives,the (cid:9) −p M(S,G )φ N V , CRISPRsystemmaynotidentifyaviralgenomeeventhoughit i j ij i j possessesaspacerwhichmatchesaprotospacerinthevirus.For j false positives, the host may randomly acquire a spacer match- Virionreleasefromnonimmunehosts ing a viral protospacer during an interaction (see below) which (cid:2) (cid:3)(cid:4) (cid:5) dV (cid:9)(cid:13) (cid:14) brings immunity along with it. Both CRISPR and non-CRISPR j =(1−q)β 1−M(S,G ) φ N V dt i j ij i j mechanisms may be involved. We model these types of errors i quantitatively as follows. If M(S,G)=1 (the host is immune Virionreleasefromimmunehosts (cid:2) (cid:3)(cid:4) (cid:5) (cid:9) to the virus via CRISPR defense), then two events can happen: + pβ M(Si,Gj)φijNiVj (2) (1) immune defense with probability 1− p, via which the host i survives and the virus is eliminated; (2) stochastic failure with Viralinfection (cid:9)(cid:2) (cid:3)(cid:4) (cid:5) Virald(cid:2)e(cid:3)a(cid:4)c(cid:5)tivation probability p, via which the host is lysed by the virus lead- − φijNiVj − mVj . ing to a burst of progeny viruses. If M(S,G)=0 (the host is i not immune to the virus via CRISPR defense), then two events In these equations, M(S,G ) denotes whether a host with canhappen:(1)hostlysiswithprobability1−q andsubsequent i j spacerstate S isimmune(M =1)ornotimmune(M =0)toa burst of progeny viruses; (2) host survival with probability q i viruswithprotospacerstateG .Thedetailsofhowsuchimmunity in which the virus is eliminated. We assume that both p and q j is determined is explained in the following section, along with are small, that is, p,q (cid:4)1, as described in Model parameters discussionsofthemeaningoftheimmunityparameters pandq. below. 2018 EVOLUTION JULY2012 MULTISCALE MODEL OF CRISPR-INDUCED COEVOLUTIONARY DYNAMICS EVOLUTIONARY COMPONENT Wesimulatetheecologicalandmolecularinteractionsofthehosts Undirectedmutationsofviralprotospacerscanoccuruponsuc- andviruses(seeeqs.1–2)deterministicallyusingode45inMatlab. cessfulinfectionofahost.VirusesthatevadeCRISPRdefenses Populationdensitieschangeuntil:(1)ahostorvirusstraingoes andotherhostdefensesexploithostcellularmachineryandpro- extinct;(2)amutationeventoccurs,eitheroftheundirected(viral) duceβvirions(seeeq.(2)).Errorsinreplicationcanleadtothe or directed (host) type; or (3) the simulation reaches a defined modification of one of the protospacer alleles in a given virion timepointfordataoutputandrecalculationofmutationratesthat withaper-alleleprobabilityμ(e.g.,thealleleindicatedbyanar- occuratperiodicintervals.Whenanyoftheseeventsoccur,the rowinFig.1A).Theundirectedmutationofavirusprotospacerset simulationispausedandthestrainmutationrates,whichdepend isdenotedasG →G(cid:6).Notethatinimplementingthemodel,we on the continually varying strain abundances, are recalculated donotconsidersimultaneousmutationsorrecombinationmech- (see Supporting Information for details.). Note that we use the anismsasmeansforintroducingvariationinprotospacerstates. term “mutation” here to denote the insertion of a sequence into Hence, if viruses are produced at a rate b per unit time in the theCRISPRlocusofahostorthechangeinsequenceofaviral entire system, then the expected number of mutations per unit protospacer. This process is repeated until one of the following timeisbμv,wherevisthenumberofprotospacerspervirus.The occurs:allhoststrainsgoextinct,allviralstrainsgoextinct,orthe stochastictimingoftheseeventsisdeterminedviarandomsam- simulationreachesthemaximumrunningtime(generally2500h pling from an exponential distribution with mean time between in the model). Simulations are run with 100 replicates, unless mutationsof1/(bμv).Duetochangesinhostandviralpopula- the computation per replicate is excessive in which case 75 or tionsthataffecttheviralbirthrate,thisrateisrecalculatedduring 25replicatesareused. thesimulation(seeSupportingInformationfordetails). Strain extinction occurs when the population density of a Directed mutation can occurwhen a host identifies andin- strainfallsbelowourcriticalpopulationthreshold,ρ ;thisactsas c tegrates a new protospacer into its CRISPR locus, where it is anabsorbingstateforstrains.Throughaneventfunctioninode45, denotedasaspacer(seeFig.1B).Duringeveryhost–viralinter- thesimulationispaused,andthesystemofordinarydifferential action, there is a small probability q of acquiring a new spacer equations (ODEs) is reduced by removing the equation for the throughuptakeofaprotospacer.Dependinguponthepreviousim- strainwhichhasfallenbelowthecutoff.Mutationeventscanoc- munestateofthehost,theadditionofaprotospacermaychange cur upon replication by viruses (via an undirected mechanism) the immune state of the host with respect to the identified and anduponvirusinfectionofahost(viaadirectedmechanism)(see imitatedgenotype.Inotherwords,thehostwillbecomeimmune Fig.1).Mutationaleventscauseanadditionofanewstrainand to all viruses which contain the protospacer that was integrated thustheadditionofanewODEtothesystem.Allmutantstrains intoS(cid:6)asaspacer.Ifthehostalreadycontainedamatchingproto- aregivenaninitialdensity10%greaterthanρ .Thetimeuntilthe c spacer,thisnewspacerdoesnotprovideanyadditionalimmunity. nextmutationaleventiscalculatedusingtheGillespiealgorithm. Ifahostdidnotpreviouslycontainamatchingspacer,thenewly Sincethetimetothenextmutationaleventisstochastic,replicate addedprotospacerallowsthehosttosurvive.Thisoftenleadsto simulationswillnotgiveidenticalresults.Inthecaseofviralmu- a selective advantage for this host strain. To summarize: proto- tation, agivenstrain ofvirusis randomlyselectedtoundergoa spacer integration to the host can occur with rate q during any mutationeventwithprobabilityinproportiontotheinstantaneous unsuccessfulattemptbyavirustoinfectahost,regardlessifthe growthrateofthatstrain.Similarly,inthecaseofhostacquisition hostpreviouslyhadimmunitytotheattackingvirus.Recallthat ofaspacer,agivenhoststrainisrandomlyselectedtoacquirea theadditionofaspacerattheleadingedgeisaccompaniedbythe new spacer in proportion to its instantaneous rate of successful lossofasinglespaceratthetrailingedgeoftheCRISPRlocus defenseevents.Dataoutputoccursatregularintervalsthroughout when the locus has a maximum number of spacers considered thesimulation.Aftereachevent—strainextinction,strainmuta- (eightinoursimulations).Additionally,notethatahostneedonly tion, or data output—the strain mutation rates are recalculated. differbyasinglespacerfromallotherhoststrainstobeconsid- Additional details of the simulation procedure are found in the ereditsownstrain.Allnovelmutantstrainsareintroducedinthe SupportingInformation. simulation,regardlessofwhethertheyhaveaselectiveadvantage ornot. MODEL PARAMETERS The choice of model parameters will vary depending on the SIMULATION PROTOCOL CRISPR system of interest. In general, we consider ecological Webeginoursimulationswithasinglehoststrainandasingleviral parameterstypicalofEscherichiacolianditsphages(DePaepe strain along with their respective spacer and protospacer states. andTaddei2006)andmolecularparametersconsistentwithsmall Ourinitialhoststrainissusceptibletotheinitialviralstrainand, errorratesinCRISPRimmunity, p(cid:4)1andq (cid:4)1.Thevalueof thus,doesnotcontainaspacermatchingaprotospacerofthevirus. p is based on work in Streptococcus thermophilus for which p EVOLUTION JULY2012 2019 LAUREN M. CHILDS ET AL. canbeconsideredanefficiencyofplatingofvirusesonCRISPR immunehostsandcanrangefrom10−4to10−7(Barrangouetal. 2007).Thevalueforq isbasedonworkinthesamesystemfor whichacquisitionofresistancetovirulentbacteriophagesoccurs rarely, q ≈10−6 (Barrangou et al. 2007; Horvath et al. 2008; Deveauetal.2008).Notethatinthismodel,thevalueq denotes thesuccessfulintegrationofanovelspacer,andhence,directed mutationofthehost. Giventhevariationinherentinviralandhostdynamics,we further restrict our attention to model parameterizations which obey the following two conditions: (1) viruses eventually die outwhen infecting immune hosts; (2) viruses coexist with non- immune hosts. Given small error rates and large burst sizes and the definitions of the ecological and molecular compo- nents, these conditions can be written compactly as: Kφβ >1 m and 1 <β< 1 (seetheSupportingInformation).Notethatac- 1−q p tual ecological parameters remain poorly known for all but the mostwellstudiedoflaboratoryhost–virussystems.Furthermore, whenselectinghostsandvirusestomodel,wenotethatparame- terchoicesarenotindependent.Forexample,viralmortalityrates andtheirproductionrates(burstsizedividedbylatentperiod)are positivelycorrelatedforsomephages(i.e.,phageswhichproduce morevirionsdegradefasterandthosewhichproducefewervirions aremorestable)(DePaepeandTaddei2006).Parametervalues that are used as baselines for all simulations (unless otherwise noted)arelistedinTable1. Thoughthemodelframeworkcanhandlearbitrarynumbers Figure 2. Dynamics and diversification of multiple spacer– of spacers and protospacers, we focus computational efforts on protospacermodel(eightspacers,10protospacers).(A)Viralpop- caseswhenu ≤8spacersandv ≤10protospacers.Thechoiceof ulation dynamics (green online) and host population dynamics spacerandprotospacernumberensuresthatthenumberofspacers (black) show that population densities undergo fluctuations. (B) islessthanthenumberofprotospacers,asisthecasebiologically. Viralstraincount(greenonline)andhoststraincount(black)show Further,ourchoiceismadeasaconcessiontoefficientnumerical thediversificationintomultiplehostandviralstrains.Thesegraphs simulation of the model, whose simulation time increases with showresultsfromasinglerepresentativesimulation(outof100 protospacer number. Computational time and power also limits replicates). thenumberofreplicatesitisfeasibletoconsider. spacer sites for which hosts have CRISPR immunity may gain somefitnessadvantage.Inthismodel,hosttypesemergethatare CRISPRimmunetomultiple(butnotnecessarilyall)virusesand, Results likewise,viraltypesemergethatinfectsome(butnotnecessarily VIRAL AND HOST DIVERSIFICATION IN A MULTIPLE all)hosts(seeFig.S1). SPACER, MULTIPLE PROTOSPACER MODEL Inthissystem,complexcoevolutionarydynamicsunfold(see Here, we examine the dynamics of a host–virus community in Fig.2A). Notethatsincethehostgenerationtimeis∼1hinthis whicheachhostpossessesmultiple spacersandeachviruspos- model(seeTable1),wewillrefertodynamicsintheequivalent sesses multiple protospacers. We find that hosts rapidly acquire scale of generations. In the first ∼50 generations, the densities CRISPR immunity through directed incorporation of spacers. of hosts and viruses oscillate around a dramatically increasing Viruses mutate randomly at one of multiple protospacer sites average.Thenumberofhostandvirusstrainsinitiallyincreases, so that not all viral mutations are immediately beneficial. Non- often exceeding dozens and sometimes hundreds of strains (see beneficial viral mutations may arise if viral mutations occur at Fig. 2B). After the system passes transient dynamics, the virus protospacersitesforwhichnohostpossessesCRISPRimmunity. populationexceedsthehostpopulationinbothdensityandstrain In contrast, viral strains that have mutations of specific proto- count(seeFig.2). Typicallyby∼750generations,theCRISPR 2020 EVOLUTION JULY2012 MULTISCALE MODEL OF CRISPR-INDUCED COEVOLUTIONARY DYNAMICS finding is important as it points out that multiscale eco- evolutionaryCRISPRmodelsmaybeappropriateforthestudyof host–virusdynamicsinnaturalenvironments(wherecoexistence maybeofinterest)orhost–virusdynamicsinindustrialcontexts (whereviraleliminationmaybeagoal). DIRECTED MUTATION OF HOSTS CHANGES HOST POPULATION SIZE AND CONTENT OVER THE LONG TERM We find that the strain composition changes over the course of asimulation,despitethemaintenanceofhighdiversitythrough- out. In other words, strains arise, exist for some period of time typicallybetween0and400h,andarelost(seeFig.S2).Toquan- titatively compare the strain composition of the host population overtime,weemploytheMorisita–Hornsimilarityindex(Wolda Figure 3. Incorporationofspacerscauseschangesinhostpopu- 1981)whichdefinesthesimilarityofcommunitiesattimest and 1 lationsizeandhostpopulationcontent.Ecologicalsimilaritybe- t takingintoaccountthetypesofstrainsandtheirabundances: 2 tween the whole host population at two time points using the Morisita–Horn index which takes into account both abundance (cid:9)S andtype(seeeqs.4–5).Timeintervalsof2hareused.Thecolor 2 n n 1j 2j barindicatessimilarityfromblue(lowsimilarity)tored(highsim- (cid:2) = j=1 , (4) ilarity).Thediagonalisthecomparisonofonecommunityagainst N N (ψ +ψ ) 1 2 1 2 itselfandhencehasperfectsimilarity(darkred).Communitiessig- where nificantlyseparatedintimeareblueindicatingnosimilarity(see bottomleftofthefigure).Theverticalbarsindicateanincreasein (cid:9)S theaveragenumberofspacersperhost.Theaveragenumberof n2 ij spacers,sismarkedabovethegraphandsaturatesatamaximum ψ = j=1 . (5) ofs=8.Theinsetisanenlargedversionoft=1500tot=1700. i N2 i Thisgraphshowsresultsfromasinglerepresentativesimulation (outof100replicates). Inequations(4–5), Sisthetotalnumberofuniquestrainsfound inoneorbothcommunities,N isthetotalnumberofindividuals i spacerlocusofallhoststrainshasacquiredafullarrayofeight incommunityi,andn isthenumberofindividualsofstrain jin ij spacers.Subsequentstrainsalsohaveafulllocus.Thereafter,the communityi (Wolda1981).Further,notethatψ isthesimilarity i averagenumberofhostandvirusstrains(from100replicates)re- of community i, and is equivalent to the probability that two mainsrelativelyconstant,althoughthedensityandabundanceof randomlychosenindividualsinthatcommunityareofthesame anyparticularstraininanyparticularsimulationchangesdramati- species. The index functions as follows: the index is zero when cally.Cross-correlationanalysiswillbeconsideredinafollow-up thereisnooverlapofstraintypesbetweentimepoints;itislow workgiventheinterestinphaselagsofconsumersandresources (nearzero)whenthereissomeoverlapofstraintypesbutatvastly withineco-evolutionarydynamicalsystemssuchasthis(Yoshida differentabundances;itishigh(nearone)whenthereisoverlap etal.2007). of many strains and these strains exist at similar abundances; it Aviralstraindoesnotnecessarilysufferanextinctionwhen isonewhenthereisexactlythesamestrainsatexactlythesame a host acquires immunity to that viral strain, because there are abundance(i.e.,comparisonofatimepointtoitself). other host strains present which that viral strain may be able to Intheabsenceofmutation,ourchoiceofparametersallows infect (see Fig. S1). However, evolutionarily induced extinction hostsandvirusestostablycoexist(seetheSupportingInforma- ofvirusesdoesoccurwhenundirectedmutationdoesnotgenerate tion).Althoughsuchacaseisbiologicallyunfeasible,itisinstruc- viralstrainsthatcanevadeCRISPRimmunitytoindividualhosts tivetoconsidertheoretically.Insuchacasewithoutmutation,the or a coalition of hosts. For example, if μ, the mutation rate of content,orstraintypes,ofthepopulationwouldnotchange,but ourviruses,istoosmall(e.g.,μ<10−8givenparametersutilized the population density of viruses and hosts would change over here)thenthereisthepossibilitythattherearetoofewviralin- time, such that the Morisita–Horn index, (cid:2), would never reach fections that lead to novel viral strains with the ability to evade zerosolongashostsandvirusespersist. CRISPR immunity (see Table S1). Such viral extinctions can For our multiscale model that includes both directed and also be ecologically induced even for larger values of μ. This undirected mutation, we calculate (cid:2) for all pairs of recorded EVOLUTION JULY2012 2021 LAUREN M. CHILDS ET AL. timepointsinoursimulation,leadingtothematrixinFigure3, Wefindthatthedominanthoststrainsdieoutandarereplaced wherereddenoteshigh(cid:2) andbluedenoteslow(cid:2).Asshownin bynewstrains(asdefinedinourmodel)withdifferentcharacter- Figure3,wefindrapidturnoverofhoststrainsovertime-scales isticevolutionarydynamics:(1)incompleteselectivesweeps;(2) thatspanthesimulation.Directedmutationofahostcanprovide negativefrequency-dependentselection;and(3)clonalcompeti- thenewhoststrainwithimmunitytoagreaterproportionofthe tionofstrainswiththesameimmunephenotype.Basedonthese virus population, allowing the host strain to increase in relative dynamics,oneofthesethreetypesofstraincohortsdominateat abundance.Weobserverepeatedinstancesinwhichthehostpop- any one time: (1) a newly evolved resistant strain, (2) an older ulationissimilartoitselfoverashortperiodoftime(seeinsertto strainmaintainedinthepopulationatlowabundance,or(3)mul- Fig.3).Instancesofshort-termsimilarityaretypicallycorrelated tiplestrainsresistanttosimilarviralsubsets,albeitwithdifferent withoscillationsinhostdensitydefinedbyourecologicalmodel spacers. parameters (the small red triangles near the diagonal in Fig. 3). Withtheexceptionofearlytimepointsinoursimulation,we Weobservethatbothhostsandviruseshaveashortlifetime(see donotobservecompletesweepswithasinglestraineliminating Fig.S2). allothers;thetime-scaleofevolutionandecologyismixedsuch Similaritybetweenpopulationscanextendbeyondshortos- that multiple strains nearly always coexist in our simulations. cillationtimescalesofinvasionwhenthereisnear-dominanceby Further, we find that in the different evolutionary dynamics de- alimitednumberofhoststrainsasshownintheFigure3insert.We scribedabove,hostsrarelyincorporatethenewestprotospacer(see donotobserveafixedtimescaleorintervalforhostpopulations Fig.S4).Rather,thehostspacerstatereflectsahistoryoftheeco- tobemaintainedwithnonzerosimilarity.Eventually,thederived logicalsuccessofvirusesandtheirprotospacers,butnotneces- strains out compete their ancestors, causing these older strains sarilyachronologicalhistoryofprotospacerappearance.Below, tofallbelowthethresholdandberemovedfromthesimulation. wedescribeeachoftheseevolutionarydynamicsingreaterdetail. Throughoutthesimulation,thefactthatpopulationsarenotsimi- First, incomplete selective sweeps are expected given the lartopopulationsatmuchlatertimesimpliesthatthesubsequent fact that ecological and evolutionary time scales are mixed in populationsaredominatedbynovelevolvedhostsratherthanre- this model. An incomplete selective sweep occurs when a host currences of prior hosts over the simulation time scale of 2500 strainevolvesthatisresistanttoviraltypes,expandingitsrange generations(seethelargebluesectioninthebottomleftofFig.3). ofresistance.Suchahoststrainusuallyevolvesfromahoststrain thatwasabundant,asnotedabove.Ifthisstrainhasasignificant INCOMPLETE SWEEPS AND BOUTS OF DIVERSIFYING advantage(i.e.,maintainsaspacerthatmatchesthedominantviral EVOLUTION DRIVE CHANGES IN POPULATION variants),itcangrowtodominateinabundanceduringthenext COMPOSITION IN THE SHORT TERM period of high density (see red, green, and dark blue curves in Here,wefocusonevolutionarydynamicsoccurringontheshort Fig. S3). In response, a viral variant that can infect the newly term.Wenotethatatanypointinthesimulation,thereareonly derived host has a competitive advantage and quickly increases a few dominant host and viral strains whereas many diversified in frequency. This is consistent with the arms race dynamic of strainsexistatlowabundance(seeFigs.S1andS3).Throughout coevolution through successive rises and falls of newly evolved thesimulation,novelhostvariantsevolvefromtwotypesofances- hostandviralstrains(BucklingandRainey2002). torstrains:highlyabundantstrains,becausetheymakeupmore Rare, older host strains may also rise to dominance at pe- of the population and thus have a greater chance of interacting riods of high host density in a manner consistent with negative withvirusesandgainingaspacerviadirectedmutation;andless- frequency-dependent selection, albeit for mechanisms that are abundantstrainsthatalreadyhaveimmunitytoaninfectingvirus predominantlyevolutionaryinnature(seeFig.S3lightblueand becausetheyhavethepossibilitytoacquireanadditionalspacer magentacurvesandthetwoorangepeaksinFig.4,notethatcolors duringsuccessfulCRISPRimmunedefensetoviralinfection.The shouldbeinterpretedseparatelyforthesetwofigures).Incontrast secondmechanismforgeneratingnewstrainsoccursinfrequently totheselectivesweepmodel,thesestrainsarenotderiveddirectly becausefewviral–hostinteractionsoccuramongrarestrains. from the previously highly abundant strain. Instead their fitness Because viral mutation is undirected, the majority of new advantagearisesbecauseviralpopulationstowhichtheyareim- viralmutantsdonothavesignificantlyincreasedfitnessbecause munegrowinnumberwhentargetingadifferenthigh-abundant theyhavenotmutatedthespecificprotospacerthatdominanthost nonimmune host. Likewise, viral populations to which they are strainshaveimmunityto.Onaverage,only1/vmutations,where notimmunemaydecreaseinnumberwhentargetedbyCRISPR visthenumberofprotospacers,willproduceaviralmutantthat immunity of abundant hosts. Together, these mechanisms may alters the dominant host immunity. In contrast, because of the haveasecondaryeffectofdecreasingviral-inducedmortalityof CRISPRmechanismofdirectedmutation,hostsresponddirectly this rare, older strain (which possess a different array of spac- toviralselection. ers).Decreaseofviral-inducedmortalityleadstothepopulation 2022 EVOLUTION JULY2012 MULTISCALE MODEL OF CRISPR-INDUCED COEVOLUTIONARY DYNAMICS −5 −5 p=10 ,q=10 1 d erre 0.8 nf o c y 0.6 nit u m m 0.4 e i v ati el 0.2 R 0 Figure 4. Proportion of host strains in the population. Host 0 1 2 3 4 5 6 7 8 strains (independent colors—colors repeat when not directly Number of spacers touching)arebornintothepopulationandincreaseinsizeover time.Thetotalheightofthecoloredareaisproportionaltothe Figure 5. Mostrecentlyacquiredspacersprovidegreatestimmu- population size and the vertical height of each color within the nity.Relativeimmunityconferredbythenewestnspacersinthe coloredareaisproportionaltothepercentofthepopulationcom- locuscomparedtotheimmunityfromthefulllocusofeightspac- prised by each strain. Strains first appear in the middle of the ers.Mean(circles)andstandarddeviation(errorbars)werecom- colorthatistheirparentstrain.Somenovelstrains(i.e.,lightblue putedfor100replicatesaveragedoverthetimepointsafterthe att≈1625denotedbyN)rapidlybecomethedominantstrain.At locusisfilledwithspacers.Immunityisdeterminedbycalculating times,multiplehostsemergeascoalitionsandcomprisesignificant what percentage of the viruses the most recent nspacers from portionsofthepopulation(i.e.,att≈1675denotedbyC).Finally, all hosts can match, where n=1,2,...,8. Relative immunity is recurrenceofstrainscanbeobserved(orangepeaksat t≈1610 thepercentageofvirusesthatthemostrecentnspacersfromall and t≈1640correspondtothesamestraindenotedbyR).Only hosts can match compared to the percentage of viruses the full hoststrainscomprisingatleast1%ofthepopulationareincluded. spacerlocus(inourcaseeightspacers)matches.Themajorityof Thetotalviralpopulationdensityisshowninthelowerpanel.This theimmunityisprovidedbythefirstspacerandmorethan80% graphshowsresultsfromasinglerepresentativesimulation(out immunityisprovidedbythefirstthreespacers. of100replicates). expansionoftherare,olderstrain.Aspointedoutabove,wedo aCRISPRlocus(HeandDeem2010)inagreementwithobser- not observe any host strains that persist over the time course of vation (Horvath et al. 2008; Held et al. 2010). This diversity is theentiresimulation. consistentwiththemechanismbywhichspacersareinsertedat Finally,wealsoobserveperiodsofhighhostpopulationden- theleadingpositionofthelocus.Ourmodeldemonstratesthatnot sity in which there is not a single dominant host strain but a onlyaretheleadingspacersmorediverse,buttheyalsoemerge coalitionofhoststrainsthatrisetogethertohighabundance.This asthemostimportantspacersforprovidingthehoststrainswith can result from multiple host strains each gaining a new spacer CRISPRimmunity(seeFig.5). Foreachtimepoint,immunity thatmatchesadifferentprotospacerinadominantvirusorviruses isdeterminedbycalculatingthepercentageofthetotalviralpop- (i.e., around 1675 h in Fig. 4). These strains are phenotypically ulation to which hosts harbor matching spacers. For example, nearlyidenticalbutdiffergenotypically.Thesecoalitionsfallin the immunity provided by the first two spacers measures what abundanceduetotheriseofadivergentvirusthatdoesnotpossess percentage of all viruses the first two spacers of all host strains anyprotospacersthatmatchanyofthenewlyaddedspacers.The match at a particular time point. Relative immunity is the im- riseofacoalitionofhostscanalsoresultfromasetofrarestrains munitycalculatedforparticularsetsofspacerscomparedtothe alreadyexistinginthepopulationinamannersimilartowhatis immunity calculated for the full spacer locus (eight spacers in describedabove. ourcase).Averagevaluesarecomputedoverallhostsacrossall time points after the locus is full of spacers. This measurement IMMUNITY OF HOSTS IS CONTROLLED BY RECENTLY indicateswhichofthespacerpositionsaremostimportantforpro- ACQUIRED SPACERS vidingCRISPRimmunity.Wefindthatthefirst(andmostrecent) A recent coevolutionary CRISPR model of hosts and viruses spacerofthelocuscontributesthegreatesttotheimmunityofthat found that spacers are more diverse in the leading position of locus,relativetotheimmunityprovidedfromtheentirelocus.The EVOLUTION JULY2012 2023 LAUREN M. CHILDS ET AL. contribution of subsequent spacers decreases such that only the is not a monotonically decreasing function of q. When q =0, firstfivelociarerequiredtoprovide>90%ofimmunity,onav- thereexistsasinglesusceptiblehost,andtheviralpopulationhas erage.Hence,theoldestspacerscontributeinsignificantlytothe the steady-state value V∗ = r(1−N∗/K) where N∗ = m (see φ φ(β−1) immunityofthelocus(seeFig.5).Althoughtheincreaseddiver- Supporting Information). When q =1, all viral strains will be sityofleadingspacerswaspreviouslyknown,boththeoretically eliminated. However, when q is increased slightly above zero, and empirically, until this model, it was not clear how impor- we find that viral density increases. Viral density increases re- tant the recently acquired spacers were to CRISPR immunity. sultfromagreaterabilityofvirusestoreplicate,eitherbecause Theemergentpropertyfromourmodelstronglysupportsthehy- there exist more host strains that lack immunity or because the pothesisthatrecentlyacquiredspacerscontributesubstantiallyto hoststrainsthatlackimmunityhavelargerpopulations.Notethat CRISPRimmunity,andmoreover,predictsthattheyaresufficient multistrain Lotka–Volterra models with fixed carrying capacity for the CRISPR immune response, regardless of spacer identity predictthatviraldensityincreaseswiththenumberofhoststrains atthetailendofthelocus. (see Supporting Information). In the case of increasing q, we expectincreasesinthenumberofhoststrainsandthusthepos- DEPENDENCE OF COEVOLUTIONARY sible secondary effect of increasing viral diversity and density DIVERSIFICATION ON THE CRISPR IMMUNITY (seeSupportingInformation).Highervaluesofqalsoleadtohost PARAMETERS populationsthatrecognizeagreaterproportionoftheviralpopu- The molecular component determines whether viral infection lation(seeFig.6E).Additionally,athighervaluesofqandfaster leadstohostlysisorviraldeactivation.Theerrorsassociatedwith acquisition,morethanjustthefirstfewspacersareimportantfor CRISPRimmunityappearinthemodelthroughtheconstants p immunity (see Fig. 7). This is because the viral strains which andq.Recallthatprepresentsthestochasticfailureofahostwith the older spacers recognize still exist in the population. See the CRISPRimmunitytorecognizeaninvadingvirus.Further,recall SupportingInformationforanalysisofvariationinthespacerand thatq representstheacquisitionrateofspacersbyhosts. protospacernumberswhileholding pandqconstant—wedonot Here, we consider the effect of varying values of p and q observeanyqualitativedifferencesindynamics. (aroundexperimentallyobservedvalues)ontheoutcomesofthe multiscale coevolutionary model. In varying these parameters, we considered values of p and q ranging from 10−6 to 10−4. Discussion We ran simulations using all nine possible combinations of p Wehavepresentedamodelofcoevolutionarydynamicstoanalyze and q values in this range separated by a factor of 10, that is, thediversificationofhostsandviruses.Themodeldemonstrates p=10−6,10−5,10−4andq =10−6,10−5,10−4.Wefindthatal- how an initially small number of host and viral strains can di- tering pwithinthisrangehasalmostnoeffectonthedynamicsin versifyintoadynamiccommunityofmanyhostsandvirusesas thesimulations.Thereasonwhypisnotamajordriverofdynam- anticipatedfromempiricalstudies(Heidelbergetal.2009;Held icsatsmallvaluescanbeunderstoodbyexaminingequations(1– etal.2010).Insodoing,weconfirmtheoreticallythehypothesis 2).Small valuesof p increaselysis rates,albeit multiple orders suggestedfromempiricaldatathatifCRISPRimmunityandviral of magnitude less than lysis rates of hosts that are not CRISPR diversificationremaininbalance,arelativelystablevirusandhost immune. Hence, viral lysis is driven by the interaction of hosts communitymayresult(AnderssonandBanfield2008;Heldetal. andvirusesforwhichCRISPRimmunityisnotpresent,whereas 2010). Specifically, the model predicts that diversity over time changing p onlyaffectsvirallysisinthoseinteractionsofhosts ismaintainedbythetemporallylimitedemergence,dominance, and viruses for which CRISPR immunity is present. However, andreplacementofstrains(andcoalitionsofstrains).Weobserve varying q significantly modifies the complexity of communi- incompletesweepsbysinglestrains,theoccasionalrecurrenceof ties even at small values. Higher values of q, corresponding to rare,olderstrainsthatobtaintemporaryfitnessadvantages(sim- morerapidacquisitionofspacers,leadtoahighnumberofhost ilar in form to negative frequency-dependent selection), as well strainswithoutsignificantchangeinthehostpopulationsize(see astheemergenceofcoalitionswhopossessnearlyidenticalphe- Fig.6A,C).Weexpectmorehoststrainswithhigherq because notypeswithdistinctspacergenotypesaspredictedinHeldetal. hoststrainsaredistinguishedbytheirspacerstates,whichevolve (2010).Thebalanceofviralandhostcoevolutionoccursdespite morerapidly.Infact,onlyonespacerneedstobedifferenttobe thefactthattheCRISPRsystemundergoesdirectedmutation.We consideredadifferentstrain.Athighervaluesofq,viralstrains,on findthatthehostsgenerallycannotacquiresomanyspacerssuch theotherhand,haveahighnumberofstrainsandhighpopulation thattheviralpopulationgoesextinct.Indeed,viralmutantsthat size (see Fig. 6B, D). This trend of increasing viral population can target dominant hosts are under positive selection because size as the hosts can more easily acquire spacers may at first theirreplicationwillbegreaterondominanthosts.Hence,evolu- seemcounterintuitivebecauseitimpliesthatviralpopulationsize tionarychangesinviralstraincompositiondrivesthechangein 2024 EVOLUTION JULY2012