vol. 160, no. 4 the american naturalist october 2002 The Continuous Prisoner’s Dilemma and the Evolution of Cooperation through Reciprocal Altruism with Variable Investment Timothy Killingback1,* and Michael Doebeli2,† 1.EcologyandEvolution,Eidgeno¨ssischeTechnischeHochschule oreticalproblemsinevolutionarybiology(see,e.g.,Ham- (ETH)Zurich,8092Zurich,Switzerland; ilton1963,1964a,1964b;Trivers1971;AxelrodandHam- 2.DepartmentofZoology,UniversityofBritishColumbia, ilton 1981; Axelrod 1984; Dugatkin 1997). Cooperative Vancouver,BritishColumbiaV6T1Z4,Canada behavior,bywhichwemeanbehaviorthatwhileinvolving SubmittedJanuary2,2002;AcceptedApril3,2002 acosttotheindividualorindividualsperformingit,ben- efits another individual or individuals, is extremelywide- spreadinnature(seeDugatkin1997forareviewofmany types of cooperative behavior), and yet there is a funda- mental difficulty ingiving a satisfactorytheoreticalexpla- abstract: Understanding the evolutionaryoriginandpersistence of cooperative behavior is a fundamental biological problem. The nationofsuchbehavior.Thedifficultyarisesfromthefact standard “prisoner’s dilemma,” which is the most widely adopted thatselfishindividualscanreapthebenefitsofcooperation frameworkforstudyingtheevolutionofcooperationthroughrecip- withoutbearingthecostsofcooperatingthemselves.Thus rocalaltruismbetweenunrelatedindividuals,doesnotallowforvary- selfish individuals, who do not cooperate, will have a fit- ing degrees of cooperation. Here we study the continuousiterated ness advantage over cooperative individuals, and natural prisoner’sdilemma,inwhichcooperativeinvestmentscanvarycon- tinuouslyineachround.Thisgamehasbeenpreviouslyconsidered selection will lead to the cooperators being eliminatedby for a class of reactive strategies in which current investments are the noncooperators. basedonthepartner’spreviousinvestment.Inthestandarditerated Since increased levels of cooperation among the indi- prisoner’sdilemma,suchstrategiesareinferiortostrategiesthattake vidualsinagroupresultinincreasedmeanfitnessforthe intoaccountbothplayers’previousmoves,asisexemplifiedbythe group members, there would be no paradoxsurrounding evolutionarydominanceof“Pavlov”over“titfortat.”Consequently, the evolution of cooperation if natural selection actedef- we extend the analysis of the continuous prisoner’s dilemma to a classofstrategiesinwhichcurrentinvestmentsdependonprevious fectively at the group level (WynneEdwards1962).How- payoffsand,hence,onbothplayers’previousinvestments.Weshow, ever, although it is in principle possible for selection to both analytically and by simulation, that payoff-based strategies, act at such a level, it is generally the case that selection whichembodytheintuitivelyappealingideathatindividualsinvest actsmuchmoreeffectivelyattheindividuallevel(Maynard moreincooperativeinteractionswhentheyprofitfromtheseinter- Smith1964,1976;Williams1966;Grafen1984).Thusthe actions, provide a natural explanation for the gradualevolutionof cooperationfromaninitiallynoncooperativestateandforthemain- paradox of cooperation remains: although a general in- tenanceofcooperationthereafter. crease in cooperation benefits all members of a group, individualselectionwillfavorselfishindividualswhoben- Keywords: evolution of cooperation, prisoner’sdilemma,reciprocal efit from the cooperation of others while not incurring altruism,adaptivedynamics,variableinvestment. the costs of cooperating themselves. Theevolutionofcooperationamongrelatedindividuals The origin and maintenance of cooperative or altruistic cantypicallybeexplainedbykinselection(Hamilton1963, behaviorisoneofthemostenduringandintractablethe- 1964a, 1964b; see Frank 1998). Fundamentally different ideasarerequiredtoexplaintheemergenceofcooperation among unrelated individuals. Proposed explanations in- * E-mail:[email protected]. clude reciprocal altruism (Trivers 1971), trait-group se- † E-mail:[email protected]. Am.Nat.2002.Vol.160,pp.421–438.(cid:1)2002byTheUniversityofChicago. lection(Wilson1975,1980;CohenandEshel1976;Matessi 0003-0147/2002/16004-0002$15.00.Allrightsreserved. and Jayakar 1976), spatial structure (Nowak and May 422 The American Naturalist 1992;Killingbacketal.1999),andindirectreciprocity(Al- by natural selection. In general, such reciprocal altruism exander 1986; Nowak and Sigmund 1998). canevolveonlyifthesameindividualsmeetrepeatedly,if Kin selection, which is the standard theoretical frame- they are capable of memory and recognition, and if the workforunderstandingcooperationbetweenrelatives,de- benefits to the individual who is helped exceed the costs pends on the fact that genes determining altruistic traits to the helper. canincreaseinfrequencyiftheinteractingindividualsare There is strong evidence for reciprocal altruism in a genetically related. This idea, which was anticipated by number of animal systems. In fact, the examples of co- Fischer (1930) and Haldane (1932) and which was ex- operation among nonrelatives given above are all cases pressed by Haldane in 1955 in his memorable comment wherereciprocalaltruismisbelievedtoplayanimportant thathewaswillingtolaydownhislifetosavetwobrothers role.Thereisverygoodevidenceofreciprocalaltruismin or eight cousins, was developed into a systematic theory the context of allogrooming in impala from the work of by Hamilton (1964a, 1964b). The essenceofkinselection Hart and Hart (1992) and Mooring and Hart (1992). In liesinHamilton’sfamousformula,whichstatesthatifan thisspecies,pairsofindividualsengageinreciprocalbouts interactionoccursinwhicha“donor”sacrificescoffspring of allogrooming, the apparent function of which is to re- anda“recipient”gainsanadditionalboffspring,thenthe ducethetickloadofthegroomedindividual.Thereisalso gene causing the donor to act in this way will increase in strong evidence of reciprocal altruisminpredatorinspec- frequency ifbr1c, where r is the degree of relatednessof tioninfishfromtheworkofMilinski(1987,1990a,1990b, the donor to the recipient. If the gene is rare, then this 1992) and others (see, e.g., Dugatkin 1988; Dugatkinand result is rather clear; however, what is much less clear, as Alfieri 1991). During predator inspection, a small group wasshownbyHamilton(1964a,1964b),isthatthisresult offishdetachfromashoaltoapproachapotentialpredator still holds good even if the gene is at high frequency.Kin (Pitcher et al. 1986) and subsequently convey (either ac- selectionisboundtooperatewhereverrelatedindividuals tivelyorpassively)theinformationthattheyhaveobtained interact. It is now generally accepted to be an important tothenoninspectors(MagurranandHiggam1988).There factor in the evolution of altruistic behavior in the social is also good evidence for reciprocalaltruisminhermaph- insects (Wilson 1971; Hamilton 1972; West-Eberhard roditic fish (Fischer 1980, 1984, 1986, 1988) and worms 1975;BourkeandFranks1995;CrozierandPamilo1996). (Sella1985,1988,1991),invampirebats(Wilkinson1984), It is also relevant to the evolution of cooperation in ver- andinanumberofothersystems(Packer1977;Lombardo tebrates(Brown1978,1987;Emlen1978,1984,1995,1996; 1985).Foradetailedreviewofthosecaseswherereciprocal Pusey and Packer 1994) and perhaps even to the origin altruism is believed to play a role, see Dugatkin (1997). of the integration of the components of eukaryotic cells Othertheoreticalideasthathavebeenproposedaspos- (Maynard Smith and Szathmary 1995). sible explanations of cooperation among nonrelatives in- Kin selection provides an elegant and powerful theo- clude the following: trait-group selection (Wilson 1975, retical framework for studying the evolution of coopera- 1980; Cohen and Eshel 1976; Matessi and Jayakar 1976), tion among relatives. Understanding the evolution of co- which depends on there being a synergistic fitness com- operation among nonrelatives, however, requires quite ponent, so that it is not possible to represent the inter- different ideas. Cooperative behavior among individuals actionsbetweenmembersofagroupintermsofasimple who are either not related or who have a low degree of loss of fitness by one of them and a corresponding gain relatedness is common. From the large number of ex- by the others, with the losses and gains combining addi- amplesofsuchbehavior(whicharereviewed,e.g.,inDu- tively); spatially structured populations (Nowak andMay gatkin1997),thefollowingmaybeselectedasillustrations: 1992; Killingback et al. 1999), which may, in particular, allogrooming in impala (Hart and Hart 1992; Mooring provide a natural explanation of how cooperation can andHart1992),eggswappinginhermaphroditicfish(Fi- evolve among very simple organisms; and indirect reci- scher1980,1984,1986,1988)andhermaphroditicworms procity (Alexander 1986; Nowak and Sigmund 1998), (Sella1985,1988,1991),predatorinspectioninfish(Mil- which may provide a new type of explanation for coop- inski 1987, 1990a, 1990b, 1992), and blood sharing in erativebehavioramonghumansandperhapssomehigher vampire bats (Wilkinson 1984). animals such as primates. A number of theoretical ideas have been suggested to Cooperative behaviorisbothwidespreadinnatureand explaintheoccurrenceofcooperationamongnonrelatives. fundamentally perplexing from an evolutionary perspec- Trivers (1971) laid the foundations of the theory of re- tive.Itismostimportant,therefore,tohavesimplemodels ciprocal altruism by pointing out that if there are oppor- ofcooperationthatallowdetailedtheoreticalinvestigation tunities for repeated interactions between the same indi- of the evolution of altruism. The standard metaphor for viduals,thenanindividualwhobehavesaltruisticallyonly the problem of cooperation is the “prisoner’s dilemma” tothosewhichreciprocatethealtruisticactwillbefavored (Trivers 1971; Smale 1980; Axelrod and Hamilton 1981; Cooperation with Variable Investment 423 Axelrod 1984). The prisoner’s dilemma is a symmetric two-person game that illustrates well the paradox sur- roundingtheevolutionofcooperation.Eachplayerinthe prisoner’s dilemma has two possible strategies: cooperate (C) and defect (D), with the payoffs for the game given in figure 1. The temptation to cheat is T, R is the reward for mutual cooperation, P is the punishment for mutual defection, and S is the payoff one obtains when cooper- ating against an opponent who defects (i.e., S is the Figure1:Payoffmatrixfortheprisoner’sdilemma.Theentriesarethe sucker’s payoff). The prisoner’s dilemma is defined by payoff toa player usingastrategyfromtheleftcolumnwhentheop- requiring that T1R1P1S and 2R1T(cid:1)S. The di- ponentusesastrategyfromthefirstrow. lemma inherent in this game is the following: since T1 RandP1S,player1shouldalwaysplayD,whateverplayer 2 does; however, since the game is symmetric, the same iteratedprisoner’sdilemma,AxelrodandHamilton(1981) reasoning also applies to player 2. Thus both playersplay and Axelrod (1984) considered the strategy “tit for tat” D and receive a payoff of P, but had they both played C (which had been devised by Anatol Rapoport).Titfortat theywouldhavereceivedthehigherpayoffofR.Thepris- is defined as follows: cooperate on the first round of the oner’s dilemma contains, in prototype form, the funda- gameand,onsubsequentrounds,playthesamemovethat mental paradox of cooperation—that while there is an youropponentplayedinthepreviousround.Axelrodand advantage to cheating over cooperating, mutual cooper- Hamilton(1981)andAxelrod(1984)foundthattitfortat ation is more profitable than mutual defection. The sig- often did wellin theiteratedprisoner’sdilemmaandthat nificanceoftheprisoner’sdilemmaasasimplemodelfor this strategy could lead to the evolution of cooperative the evolution of cooperation seems to have been appre- behavior. Subsequent work on the evolution of coopera- ciated first by Hamilton and by Trivers (see commentsin tionintheiteratedprisoner’sdilemmahasfoundthatthere Trivers 1971). Itfollows directlyfromthestructureofthe areotherstrategiesthatoftenperformevenbetterthantit game that if individuals are trapped in a prisoner’s di- for tat. Nowak and Sigmund (1992) found that a more lemma, with each pair of individuals playing the game generous variant of tit for tat (“generous tit for tat”)that only once and with random interactions between the in- does not always retaliate against a defection by one’s op- dividuals,thennaturalselectionwillalwaysfavordefectors ponent is often a good strategy. In further work, Nowak (Axelrod and Hamilton 1981; Axelrod 1984). Thus in a and Sigmund (1993) found that a new strategy,“Pavlov,” population of cooperators and defectors, evolution will which depends on both players’ previous moves, usually always result in the cooperators being eliminatedandthe outperformstitfortat.ThePavlovstrategycooperateson defectors going to fixation. the first move, and on subsequent moves, it repeats its Although cooperation can never evolve foranyentities ownlastmoveafterreceivingapayoffofRorTandplays trapped in a one-shot prisoner’s dilemma, it is possible the opposite of its own last move after receiving a payoff for cooperation to evolve if the gameisplayedrepeatedly of P or S. Pavlov has two important advantages over tit between the same individuals. The simplest model em- for tat: first, it is able to correct mistakes, and unlike tit bodying Trivers’s (1971) concept of reciprocal altruismis for tat, a single mistake does not lead to a permanent the iterated prisoner’s dilemma (Axelrod and Hamilton breakdown in cooperation; second, Pavlov is able to ex- 1981;Axelrod1984),whichconsistsofarepeatedsequence ploitunconditionalcooperatorsandtherebypreventthem ofprisoner’sdilemmagamesplayedbetweenthesametwo from increasing in frequency to the point where theycan players, with the players’ moves at each stage depending on the history of past moves by both players. The great lead to the growth of unconditional defectors. meritoftheiteratedprisoner’sdilemmaisthatitprovides In studying the evolution of cooperative strategies in a simple, yet precise, model with which to study theevo- the iterated prisoner’s dilemma, it is necessary to specify lution of cooperation via reciprocity. The iterated pris- whether or not the two players make their moves simul- oner’s dilemma is now the standard model for studying taneously.Ifthetwoplayers’movesarenotsynchronized, the evolution of cooperation by reciprocal altruism and then one obtains an iterated game consisting of an alter- has generated a vast amount of research (some of which natingsequenceofprisoner’sdilemmagames(Frean1994; is reviewed by Axelrod and Dion 1988). Nowak and Sigmund 1994). Different strategies do well Much of this research has focused on identifyingstrat- in these two variants of the iterated prisoner’s dilemma. egies for the iterated prisoner’s dilemma that allow the Inthesimultaneouscase,Pavlovappearstobeaverygood evolution of cooperation. In their original work on the strategy (Nowak and Sigmund 1993), while in the alter- 424 The American Naturalist nating case, generous tit for tat seems to be the better Understanding the evolution of cooperation by recip- strategy (Nowak and Sigmund 1994). rocal altruisminthisframeworkinvolvesformulatingthe Despite its widespread use as a model of cooperation, iterated version of the continuous game (that is, the it- the prisoner’s dilemma suffers from a fundamental limi- eratedcontinuousprisoner’sdilemma)andunderstanding tation—each player has only two possible options: to co- how strategies for the iterated game can evolve to allow operate or to defect. Behavior in real systems can hardly cooperation. This problem has been studied by Roberts be expected to have this dramatically discrete nature. In- and Sherratt (1998) and by Wahl and Nowak (1999a, deed there is considerable evidence that cooperative be- 1999b). The authors of these two works study different havior in nature should be viewed as a continuousrather strategies for the continuous prisoner’s dilemma (which than a discrete trait. For example, it has been found in aredescribedinmoredetailin“Discussion”),butinboth theworkonallogroominginimpala(HartandHart1992; cases, the strategies depend only on the opponent’s pre- Mooring and Hart 1992) that the number of grooming vious move. In this respect, these strategiesareanalogous boutsdeliveredvariessubstantially.Ithasalsobeenshown to reactive strategies, such as tit for tat, in the standard in the contextofpredatorinspectioninfishthatdifferent iterated prisoner’s dilemma. However,inviewofthesuc- degrees of “cooperation” often occur (see Dugatkin and cess of strategies in the iterated prisoner’s dilemma that Alfieri1991).Itseemslikelythatinmostcasescooperative depend on both players’ previous moves, such as Pavlov behaviorcanvaryindegree,withcompletedefectionbeing (Nowak and Sigmund 1993), it is natural to ask whether one extreme. therearesuccessfulstrategiesforthecontinuousprisoner’s The importance of allowing variable degrees of coop- dilemma that depend on both players’ moves. Here we eration in theoretical studies of the evolution of cooper- introduce such a class of strategies, which depend on the ation has been appreciated for some time (Smale 1980; players’ payoff in the previous round (and which are re- May1987;Frean1996;WilsonandDugatkin1997).How- latedtothoseconsideredbyDoebeliandKnowlton[1998] ever,itisonlyrecentlythatdetailedmodelsextendingthe in the context of mutualism), and we show, both analyt- standard prisoner’s dilemma to situations with varying icallyandbysimulation,thattheygiveasimpleandelegant degreesofcooperationhavebeenformulated(Doebeliand explanationoftheevolutionofcooperationviareciprocal Knowlton1998;RobertsandSherratt1998;Killingbacket altruismwhenthedegreeofcooperationcanvary.Inpar- al. 1999; Wahl andNowak1999a,1999b).Thisextension, ticular, we show that this new class of strategies (which which we shall refer to as the “continuous prisoner’s di- we call payoff-based strategies) provides a natural reso- lemma,” is described in detail in the next section. The lution of the fundamental problem of how cooperative continuous prisoner’s dilemma provides a natural model behavior can evolve in an initially selfish world and how forstudyingcooperationwhenthecooperativetraitiscon- such behavior can be maintained thereafter. tinuous in nature. An important consequence of the formulation of the The Continuous Prisoner’s Dilemma and continuous prisoner’s dilemma is that it allows detailed Payoff-Based Strategies studyoftheevolutionofcooperationviareciprocalaltru- ism for systems with continuous cooperative traits. This Weintroduceherethecontinuousprisoner’sdilemma(fol- approach allows us to consider fundamental new aspects lowing the general approach described in Killingback et of the evolution of cooperation. The twomostimportant al. 1999) and the class of strategies (payoff-based strate- questions are, Can reciprocal altruism with variable de- gies) that we shall consider in the iterated game. Perhaps grees of cooperationresultincooperativebehaviorevolv- the simplest way to arrive at the continuous prisoner’s ing gradually from an initially selfish state? and Can co- dilemma is to start with the discrete prisoner’s dilemma, operativebehaviorbemaintainedindefinitely,ordoesthe expressedinitsmostbiologicallyintuitiveformulation.As continuousnatureofthetraitresultinacooperativestate above,weletCandDdenotethestrategiesofcooperation being gradually undermined? That is, in the continuous and defection, respectively. We assume that cooperation framework, we can investigate how cooperative behavior involvesacostctothe“donor”(i.e.,itresultsinthedonor can arise in the first place and whether or not itisstable. havingcfeweroffspringthanitwouldotherwisehavehad) It is worth noting that the problem of how cooperation and brings a benefit b to the “recipient” (i.e., it results in initially evolves in a selfish world has proved to be hard the recipienthaving bextraoffspringinadditiontothose to resolve satisfactorily within the standard frameworkof thatitwouldotherwisehavehad).Defectionhaszerocost the discrete prisoner’s dilemma (see, e.g., Axelrod and to the donor and gives zero benefit to the recipient. It is Hamilton 1981; Axelrod 1984). The approachintroduced clear now that, under these assumptions, the payoffs to here, in which cooperation is a continuous trait, allowsa two individuals adopting the strategies C and D are as new attack on this fundamental problem. shown in figure 2. In the context of the evolution of co- Cooperation with Variable Investment 425 linear cost function, figure 3B shows a concave benefit function and a linear cost function, and figure 3C shows a concave benefit function and a quadratic cost function. Once the cost and benefit functions have been defined, thissystemdefinesthecontinuousprisoner’sdilemma(see Killingbacketal.1999).Theanalyticalresultswedescribe below hold for arbitrary cost and benefit functions. The simulation results we present below are for the cost and benefitfunctionsshowninfigure3.However,similarsim- Figure 2: Payoffmatrixfortheprisoner’sdilemmaintermsofcostsc and benefits b. The entries are the payoff to a player using a strategy ulation results are obtained for a wide variety of quali- fromtheleftcolumnwhentheopponentusesastrategyfromthefirst tatively similar cost and benefit functions—the specific row. form of the cost and benefit functions affects only the quantitative details of the results, not the qualitative be- operation,werequirethatthecostandbenefitarepositive havior of the system. (i.e., b,c10) and that the benefit exceeds the cost (i.e., In order to study reciprocal altruism, we wish to con- b1c). If this latter condition is not satisfied, then it is sider the iterated continuous prisoner’s dilemma, which intuitivelyclearthatcooperationcanneverevolvefortriv- in turn requires that we have strategies that determine a ial reasons. When both these conditions hold, then the player’s move given the past history of moves by both payoffsgiveninfigure2defineaprisoner’sdilemma(i.e., players. A strategy j of memory l for the iteratedcontin- if we set Tpb, Rpb(cid:2)c, Pp0, and Sp(cid:2)c, then uous prisoner’s dilemma determines the players invest- T1R1P1S and 2R1T(cid:1)S). It is now straightforward mentI inroundk,intermsofthepreviouslmovesmade k to make the transition to the continuous prisoner’s by both players: dilemma. The degree of cooperation that an individual engages I pj(I ,I ,…,I ,I(cid:1) ,I(cid:1) ,…,I(cid:1) ), (1) in is defined by an investment (or perhaps the term“do- k k(cid:2)1 k(cid:2)2 k(cid:2)l k(cid:2)1 k(cid:2)2 k(cid:2)l nation” would be more appropriate) I. Making an in- vestment I involves a cost (i.e., a decrease in fitness) where I and I(cid:1) denote the investment in round j of the j j C(I) to the donor and brings a benefit (i.e., an increase player and opponent, respectively. The opponent’s in- infitness)B(I)totheindividualwhoistherecipient,where vestment in round k is similarly determined bya strategy boththecostandbenefitdependonthelevelofinvestment j(cid:1):I(cid:1)pj(cid:1)(I ,…,I ,I(cid:1) ,…,I(cid:1) ). In each round, the k k(cid:2)1 k(cid:2)l k(cid:2)1 k(cid:2)l I. Thus if two individuals, making investment I and I(cid:1), payoffsarecalculatedasdescribedabove.Giventwostrat- respectively, play simultaneously, then the payoff to in- egies j andj(cid:1) andaprobabilitywofafurthermoveafter dividual 1 is S(I,I(cid:1))pB(I(cid:1))(cid:2)C(I), while the payoff to each round of the game (which corresponds to a game individual2isS(cid:1)(I(cid:1),I)pB(I)(cid:2)C(I(cid:1)).Although,inprin- with an average of np1/[1(cid:2)w] rounds), we define the ciple, we could consider arbitrary functions for B and C, payoffE (j,j(cid:1))tobetheexpectedmeanpayoffperround w there are two cases for B and two cases for C that are that j obtains when playing against j(cid:1). The limiting case particularlynatural.ThefirstiswhenBisaconcavefunc- wp1 corresponds to an infinitely iterated game. tionofI.Inthiscase,thebenefitshowsdiminishingreturns In principle, a memory l strategy for the iterated con- as the investment increases. It seems likely that in most tinuous prisoner’s dilemma can be an arbitrary function naturalsystemthebenefitwillexhibitdiminishingreturns of the last l investments made by both players. Here we for sufficiently large levels of investment (see Altmann will consider a simple class of strategies obtained by re- 1979; Weigel 1981; Schulman and Rubenstein 1983). The stricting these general strategies in three ways. First, we second case is when the benefitis alinearfunctionofthe will consider strategies of memory 1. Although it would investment. Linear benefit functions arise as approxima- be very interesting to include strategies of memory l11, tions to more general concave functions when the in- the restriction to lp1 is not unrealistic, since, even for vestmentlevels are low. The twomostnaturalcasesforC humans, it has been shown that working memory limi- are those of a linear and a quadratic function. There is tations result in mostly memory 1 or 2 strategies being good evidence that in many situations the cost is well usedintheiteratedprisoner’sdilemma(MilinskiandWe- described by a quadratic function (Sibly and McFarland dekind1998).Second,wewillassumethataplayer’sstrat- 1976).Linearcostfunctionsarealsointerestingtoconsider egydependsonthepayofftheplayerobtainedinthepre- becausetheyariseasapproximationstomoregeneralcost vious round. Third, we assume that the function j functions.Typicalcostandbenefitfunctionsareshownin specifying the strategy is a linear function of the payoff. figure 3: figure 3A shows a linear benefit function and a Linear payoff-based strategies of this type can be defined 426 The American Naturalist formally as follows: In the first round, invest I pa; in 1 round k(cid:1)1, invest I pa(cid:1)bP, (2) k(cid:1)1 k where P pB(I(cid:1))(cid:2)C(I) is the player’s payoff in roundk k k k and a and b are real parameters. To exclude negative in- vestments,wesetI p0ifP !(cid:2)a/b.Similarly,theop- k(cid:1)1 k ponent’s strategy is defined by investing I(cid:1)pa(cid:1) in the 1 first round and, in round k(cid:1)1, by investing I(cid:1) pa(cid:1)(cid:1)b(cid:1)P(cid:1), (3) k(cid:1)1 k where P(cid:1)pB(I)(cid:2)C(I(cid:1)) is the opponent’s payoff in k k k round k and a(cid:1) and b(cid:1) are real parameters. Again we set I(cid:1) p0 if P(cid:1)!(cid:2)a(cid:1)/b(cid:1). We will denote the strategy with k(cid:1)1 k parameter a, b by j . The space of all strategies of the a,b form j will be denoted by S. The basic problem that a,b wewilladdressbelowconcernsthequestionofhowgrad- ual evolutionary change occurs in the strategy space S. So far we have considered the continuous prisoner’s dilemma in the case where both players move simulta- neously.However,itisalsopossibletoconsideravariant in which the players move in alternating sequence (see Frean 1994; Nowak and Sigmund 1994). In the alter- nating game, if player 1, say, moves by investing I, then player 2 responds by investing I(cid:1). The move of player 1 and the response by player 2 constitute one round of the alternating game. The payoffs in this alternating game are as follows: Given an alternating sequence of moves, in which player 1 moves first (I,I(cid:1),I ,I(cid:1),…), 1 1 2 2 then the sequence of payoffs for the two players is (B(I(cid:1))(cid:2)C(I),B(I )(cid:2)C(I(cid:1)),B(I(cid:1))(cid:2)C(I ),B(I )(cid:2)C(I(cid:1)),…). 1 1 2 1 2 2 3 2 If player 2 moves first, yielding the sequence of moves (I(cid:1),I I(cid:1),I ,…), then the sequence of payoffs is 1 1, 2 2 (B(I)(cid:2)C(I(cid:1)),B(I(cid:1))(cid:2)C(I),B(I )(cid:2)C(I(cid:1)),B(I(cid:1))(cid:2)C(I ),…). 1 1 2 1 2 2 3 2 The payoff-based strategy for the alternating game is defined in exact analogy with the simultaneous case (cf. eqq. [1], [2]). In the case when player 1 moves first, the strategy is defined as follows: In the firstround,players1 a,b10, this nonlinearityimpliesdiminishingreturnsforincreasedin- Figure3:Possiblecostandbenefitfunctionsinthecontinuousprisoner’s vestments I. Such functions were used for the numerical simulations dilemma.A,Linearcostfunction,C(I)pC 7I,andlinearbenefitfunc- shown in figures 4B and 5B. C, Nonlinear cost function, C(I)p 0 tion, B(I)pB 7I. Such functions were used forthe numericalsimu- C 7I2, implying accelerated costs for increasedinvestments,andnon- 0 0 lations shown in figures 4A and 5A. B, Linear cost functions,C(I)p linearbenefitfunction,B(I)pa(1(cid:2)exp[(cid:2)b7I]).Suchfunctionswere C 7I,andnonlinearbenefitfunctions,B(I)pa(1(cid:2)exp[(cid:2)b7I]).With usedforthenumericalsimulationsshowninfigures4Cand5C. 0 Cooperation with Variable Investment 427 and2investI paandI(cid:1)pa(cid:1),respectively,andinround namics of the iterated continuous prisoner’s dilemma.In 1 1 k(cid:1)1, players 1 and 2 invest general, adaptive dynamics assumes that the population underconsiderationremainsessentiallyhomogeneousand I pa(cid:1)b[B(I(cid:1))(cid:2)C(I)], (4a) evolves under a selection-mutation regime. It is assumed k(cid:1)1 k k that mutations are small and occur only rarely, so that a I(cid:1) pa(cid:1)(cid:1)b(cid:1)[B(I )(cid:2)C(I(cid:1))], (4b) k(cid:1)1 k(cid:1)1 k mutant will either have vanished or taken over the pop- ulation before the next mutation arises. Under these as- respectively.Ifplayer2movesfirst,theninthefirstround, sumptions,itispossibletodefineadeterministicdynamics players 2 and 1 invest I(cid:1)pa(cid:1) and I pa, respectively, 1 1 in phenotype space that follows the selection gradients and in round k(cid:1)1, they invest obtained as derivatives of a mutant’s payoff. These selec- I(cid:1) pa(cid:1)(cid:1)b(cid:1)[B(I)(cid:2)C(I(cid:1))], (5a) tiongradientsspecifythedirectionofthemaximalincrease k(cid:1)1 k k of the mutant’s fitness. I pa(cid:1)b[B(I(cid:1) )(cid:2)C(I)], (5b) In general, let us consider a continuous n-dimensional k(cid:1)1 k(cid:1)1 k phenotypic trait yp(y,…,y ), where the y,ip 1 n i respectively. In either of these cases, we again set the in- 1,…,n are the various components that determine the vestments in round k(cid:1)1 to 0 if the right-hand sides in phenotype of an individual. For example, in the case at equations (4) and (5) are !0. As for the simultaneous hand, we have np2, y pa, and y pb. To derive the 1 2 game, the payoff from a given iteration of thealternating adaptive dynamics, we assume that the population is es- gameisdefinedtobetheexpectedmeanpayoffperround. sentiallyhomogeneouswithallindividualsusingthesame As this payoff will depend on which player moves first, strategy y except for an occasional mutant that uses a we define the final payoff in the alternating game to be strategy z that is close to y. The payoffforsuchamutant the mean payoff obtained from these two cases. In this isE(z,y).Theadaptivedynamics(HofbauerandSigmund article, we study the evolution of cooperation via payoff- 1990, 1998; Nowak and Sigmund 1990; Metz et al. 1996; basedstrategiesforboththesimultaneousandalternating Geritz et al. 1998) are then defined by iterated continuous prisoner’s dilemma. F Analytic Results and Evolutionary Simulations y˙ p (cid:1) E(z,y) . (6) i (cid:1)z In this section, we consider the fundamentallyimportant i zpy questionofwhetherstrategiesoftheformj canleadto a,b the evolution of cooperation in the iterated continuous This defines a vector field on the space of strategies that, prisoner’s dilemma. Since a strategy j is characterized for any y, points in the direction ofthemaximalincrease a,b by the two parameters a and b, the phenotype space in of the mutant fitness advantage relative to the resident which evolution takes place is two-dimensional. In this population. two-dimensional phenotype space, the point (0,0) cor- In the quantitative genetics framework of Lande responds to a totally noncooperative strategy that never (1982) as well as in the adaptive dynamics framework makes any investments. Cooperative strategies are char- of Metz et al. (1996), Dieckmann and Law (1996), and acterizedbypoints(a,b)inthepositivequadrantofphe- Geritz et al. (1998), the vector of fitness gradients notypes space that are far enough away from the origin ((cid:1)/(cid:1)z)E(z,y)F ,ip1,…,nappearingontheright-hand i zpy to induce nonzero investments and, hence, positive pay- sideof(6)mustbemultipliedbyan#n-matrixdescrib- offs. Thus, the principal question is whether selection in ing the mutational process in order to obtain the vector apopulationconsistingofnoncooperativestrategieschar- of phenotypic changes y˙ ,ip1,…,n. This mutational i acterized by very small a and b drives these parameters matrixdescribeshowthedirectionofchangeinphenotype tohighervaluesinducingcooperation.Wewillstudythese space is influencedbythemutationalvariance-covariance ˙ issues both analytically, using ideas of adaptivedynamics, structure among the n phenotypic components y. Here i and through numerical simulations. we make the simplifying assumption that the covariance between mutations in the two components y pa and 1 y pb is 0. Inotherwords,weassumethatmutationsin Adaptive Dynamics 2 a occur completely independently from mutations in b. Adaptivedynamics(HofbauerandSigmund1990;Nowak This implies thattherearenoconstrainingrelations(e.g., and Sigmund 1990; Metz et al. 1996; Geritz et al. 1998; trade-offs)betweenthesetraits.Asaconsequence,thema- Hofbauer and Sigmund 1998) provide a convenient an- trix describing the mutational process is diagonal, and alytic framework for investigating the evolutionary dy- hence the change in each phenotypic direction y is pro- i 428 The American Naturalist portional to the fitness gradient ((cid:1)/(cid:1)z)E(z,y)F , so that Inthelimitkr(cid:3),thepayoffsPpP andP(cid:1)pP(cid:1)satisfy i zpy (cid:3) (cid:3) (6) is, up to some positive constants of proportionality the fixed-point equations describing the rate and magnitude of mutations in each direction y, an accurate description of the evolutionary dynamics. Fiinally, since weareonlyinterestedinwhether PpB 7(a(cid:1)(cid:1)b(cid:1)P(cid:1))(cid:2)C 7(a(cid:1)bP), (8a) 0 0 phenotypes evolve away from the origin, that is, only in the sign of the fitness gradients((cid:1)/(cid:1)z)E(z,y)F for small P(cid:1)pB07(a(cid:1)bP)(cid:2)C07(a(cid:1)(cid:1)b(cid:1)P(cid:1)). (8b) i zpy y, neglecting these constants of proportionality is appro- i priateforourpurposes,sothat(6)isacorrectdescription Fromthesefixed-pointequations,weobtainthefollowing of the adaptive process. expressions for P and P(cid:1): Wewillnowapplytheseideastostudytheevolutionary dynamics of the iterated continuous prisoner’s dilemma. Thefundamentalquestionofinterestiswhetherthestrat- (B b(cid:1))(B a(cid:2)C a(cid:1))(cid:1)(B a(cid:1)(cid:2)C a)(C b(cid:1)(cid:1)1) egiesja,ballowcooperationtodevelop.Thisquestionwill Pp 0 (C0b(cid:1)1)(C0 b(cid:1)(cid:1)1)0(cid:2)(B b)0(B b0(cid:1)) , (9a) be answered in the affirmative if it can be shown that 0 0 0 0 starting from any initial strategyj with small values of (B b)(B a(cid:1)(cid:2)C a)(cid:1)(B a(cid:2)C a(cid:1))(C b(cid:1)1) a,b P(cid:1)p 0 0 0 0 0 0 . (9b) aandbtheevolutionarydynamicsonthespaceSissuch (C b(cid:1)1)(C b(cid:1)(cid:1)1)(cid:2)(B b)(B b(cid:1)) 0 0 0 0 that it yields strategies with larger values of a and b. If thisistrue,thenitfollowsthatnostrategyj willevolve a,b into the unconditional defector j . Analysis of the Jacobian matrix of the dynamical system 0,0 Thecontinuousprisoner’sdilemma,aswehavedefined defined by equations (7a) and (7b) at the fixed point it in the previous section, depends on, in general, the (P,P(cid:1)) shows that the fixed point will be (globally) as- nonlinear benefit and cost functions B and C. However, ymptotically stable if and only if 211(cid:1)bb(cid:1)(C2(cid:2)B2)1 0 0 intheproblemunderconsiderationhere,weareconcerned (b(cid:1)b(cid:1))C , which is satisfied for all sufficiently small b 0 with the evolution of strategies j , where a and b are and b(cid:1). Thus for sufficiently small values of b and b(cid:1) the small. For small values of a and ba,,bthe investmentdeter- payoffs PpE1(j,j(cid:1)) and P(cid:1)pE1(j(cid:1),j) in the infinitely minedbyj willalsobesmall.Hence,forourpurposes, iterated game are given by equations (9a) and (9b). We a,b we need only consider the functions B(I) and C(I) for note that because we are interested in the question of small I. We are free, therefore, to approximate B(I) and whether payoff-basedstrategies of theformja,b allowco- C(I)bytheirlinearizationsBIandCI,respectively,where operation to develop, which is equivalent to the question 0 0 B pB(cid:1)(0)andC pC(cid:1)(0)arethederivativeofthefunc- of whether strategies with small values of a and b evolve 0 0 tions at Ip0. This replacement of nonlinear functions into strategies with larger values of a and b, we are in- terested in the evolutionary dynamics of the system in by linear ones results in an important simplification. precisely the range of small a and b values for which In order to give an analytical treatment of the evolu- expressions (9a) and (9b) are valid. tionarydynamicsoftheiteratedcontinuousprisoner’sdi- In the problem under consideration, the general equa- lemma, it is convenient to introduce one further simpli- tions of adaptive dynamics defined by equation (6) take fication: namely, to consider the limiting case of an the form infinitely iteratedgame.Theanalyticalresultsthatweob- tain for the infinitely iterated case will also hold for any finitely iterated game of sufficient length. F Wefirstdiscussthecaseofthesimultaneousgame.Con- (cid:1)E (j(cid:1),j) sider an infinitely iteratedgame of thesimultaneouscon- a˙ p 1 , (10a) tinuous prisoner’s dilemma played between a strategy (cid:1)a(cid:1) (a(cid:1),b(cid:1))p(a,b) ja,b and ja(cid:1)(cid:1),b(cid:1). Denote the payoff of j against j(cid:1) in round F k of the game by P and similarly the payoff ofj(cid:1) against k (cid:1)E (j(cid:1),j) j in round k by P(cid:1). It follows from the definition of the b˙ p 1 . (10b) strategiesjandj(cid:1)k(seeeqq.[2],[3])thatPk(cid:1)1andPk(cid:1)(cid:1) 1are (cid:1)b(cid:1) (a(cid:1),b(cid:1))p(a,b) given by the recursion relations Here we regard the strategy jpj as comprising the a,b Pk(cid:1)1pB07(a(cid:1)(cid:1)b(cid:1)Pk(cid:1))(cid:2)C07(a(cid:1)bPk), (7a) rItesfiodlelonwtspforpoumlaetiqounataionndsj(9(cid:1)ap)ajnad(cid:1),b((cid:1)9abs)athmatutthaenteqsutraattieognys. P(cid:1) pB 7(a(cid:1)bP)(cid:2)C 7(a(cid:1)(cid:1)b(cid:1)P(cid:1)). (7b) of adaptive dynamics assume the explicit form k(cid:1)1 0 k 0 k Cooperation with Variable Investment 429 B (B b)(cid:2)C (C b(cid:1)1) We have proved the threshold theorem here for the a˙ p 0 0 0 0 , (11a) (C b(cid:1)1)2(cid:2)(B b)2 simultaneous form of the iterated continuous prisoner’s 0 0 dilemma. However, the theorem also holds for the alter- b˙ p(B0b(cid:1)C0b(cid:1)1)(B0(cid:2)C0)a natingformofthegame.Thisfollowsfromtheobservation [(C b(cid:1)1)2(cid:2)(B b)2]2 that in the infinitelyiteratedalternatinggametheasymp- 0 0 totic payoffs satisfy the same fixed-point equations as in #[B (B b)(cid:2)C (C b(cid:1)1)]. (11b) 0 0 0 0 the infinitely iterated simultaneous game (and the fixed point in the alternating case is also stable for sufficiently For sufficiently small b, the denominators in (11a) and small values of b and b(cid:1)). Thus, the argument used here (11b)arepositive.Inaddition,wemusthaveB 1C (oth- 0 0 to derive the threshold theorem for the infinitelyiterated erwisecostswouldexceedbenefitsforanyinvestment,and simultaneous game also applies to the infinitely iterated cooperation would not evolve for trivial reasons), and alternating game. Although we have only proved the hence the expression(B0b(cid:1)C0b(cid:1)1)(B0(cid:2)C0)a appear- threshold theorem for infinitely iterated games, theresult ing in the nominator of (11b) is also positive. Therefore, willcontinuetoholdforfinitelyiteratedsimultaneousand growth of both a and b is determined by the sign of alternating games of sufficient length. We will comment B0(B0b)(cid:2)C0(C0b(cid:1)1). Thus we obtain the “threshold furtheronthesignificanceofthisresult,whichshouldnot theorem”: bemisconstruedassayingthatpayoff-basedstrategiesnec- essarilyperformequallywellinboththesimultaneousand Threshold Theorem. Given any initial strategy alternating games, in “Discussion.” j ,whereaandbaresmall,ifbexceedsthethresh- a,b oldvalueb pC(cid:1)(0)/[B(cid:1)(0)2(cid:2)C(cid:1)(0)2],thentheevo- C lutionary dynamics will act to increase the valuesof Evolutionary Simulations a and b. We have seen above that the framework of adaptive dy- It follows from the threshold theorem that any strategy namics allows an analytic discussion of the evolution of j ,witha,bsmallandb1b ,willevolveintoastrategy cooperationviapayoff-basedstrategies.Theanalytictreat- a,b C withlargeraandbvalues.Itisalsoaconsequenceofthe ment was possible only under a number of assumptions: threshold theorem that any strategy j , which satisfies thatthepopulationremainsessentiallyhomogeneous,that a,b the threshold condition b1b , will not evolve into the the game is infinitely iterated, and that the analysis is C unconditional defector j . The threshold theorem also restricted to strategies with small values of a and b. To 0,0 impliesthatgivenanyinitialstrategyj ,witha,bsmall, obtainaninsightintotheevolutionarydynamicsofpayoff- a,b and given a fixed cost per unit investment C, evolution basedstrategieswithouthavingtomakesuchassumptions, 0 willincreaseaandbifthebenefitperunitinvestmentB we have to resort to simulations. We consider thefinitely 0 is large enough (so that b is smaller than b). Thus, co- iteratedcontinuousprisoner’sdilemma,withafixedprob- C operation will gradually evolve from a noncooperative ability w!1 of a further move after any round and with state if the benefits per unit investment are large enough defined benefit and cost functions B and C. Todefinethe compared with the costs. simulationscheme,weintroduceapopulationofstrategies It is important to note that for sufficiently large values j for ip1,…,n, where j denotes the strategy j . If i i ai,bi ofaandbtheargumentsleadingtothethresholdtheorem p is the frequency of strategy i, then the fitness of j is i i will no longer remain valid. This is because, first, the as- W pSpE (j,j)andthemeanfitnessofthepopulation i j j w i j sumption that we can linearize the costand benefitfunc- is WpSpW. The population is assumed to evolve ac- i i i tionswillnolongerbeagoodapproximationand,second, cording to the standard replicator dynamics (Maynard thepayoffswillnolongerconvergetothefixedpointgiven Smith 1982; Hofbauer and Sigmund 1998); that is, the byequations(9a)and(9b).However,inspiteofthislim- frequency of strategy i in the next generation,p(cid:1), is given i itation, the threshold theorem guarantees that, under the byp(cid:1)ppW/W.Tocarryoutthesimulation,westartwith i i i standard assumptions of adaptive dynamics, any strategy aninitialpopulationofstrategiesj0withinitialfrequencies i j ,witha,bsmallandb1b ,willevolveintoastrategy p0 and allow the population to evolve according to the a,b C i with larger a and b and, consequently, that any strategy replicator dynamics. Every N generations, on average, we thatsatisfiesthethresholdconditionwillneverevolveinto introduce a randomly generated mutant strategy into the the unconditional defector. Thus, we can conclude from population and continue to follow the evolution of the thisanalyticalresultthattheclassofpayoff-basedstrategies system.Ifthefrequencyofanystrategyfallsbelowathresh- j allowsmorecooperativebehaviortoevolvefrominitial old (cid:2), then that strategy is removed from thepopulation. a,b strategies that are rather uncooperative and allows coop- Sincetheevolutionarydynamicsarefrequencydependent, erative behavior that has evolved to be maintained. it will typically maintain a heterogeneous population of 430 The American Naturalist strategies,althoughthenumberofdistinctstrategiespres- strategiesaredefinedasfollows:Raisethestakes(RTS)is, ent at any one time may be quite small (formoreonthis ingeneral,characterizedbytwoparameters(a,b).Onthe effect, in the context of the standard iterated prisoner’s first move, RTS invests a; subsequently, if the opponent’s dilemma, see Nowak and Sigmund 1993, where a similar move exceeds RTS’s previousinvestment,thenRTSraises simulation scheme is used). itsinvestmentby2b,while,iftheopponentmatchedRTS’s The evolutionary dynamics of typical simulations are previousmove,thenRTSraisesitsinvestmentbyb.Ifthe shown in figures 4 and 5. Figures 4A, 4B, and 4C show opponentinvestslessthanRTS’spreviousmove,thenRTS the evolutionary dynamics of the simultaneous game for matchestheopponent’slastinvestment.Thelinearreactive (A)linearcostsandbenefits,(B)linearcostsandnonlinear strategies, which are characterized by three parameters benefits, and (C) nonlinear costs and benefits. The cor- (a,d,r), are defined as follows: On the first move, invest responding cases for the alternating game are shown in a; on subsequent moves, invest d(cid:1)rI(cid:1), where I(cid:1) is the figures 5A, 5B, and 5C. We see in these simulations that, opponent’s previous investment. Itis alsoassumedinthe starting from a population with small values of a and b, definition of the linear reactive strategies that there is a with b exceeding the critical value b pC(cid:1)(0)/[B(cid:1)(0)2(cid:2) fixed range of possible investments, withIp0 being the C C(cid:1)(0)2], the population mean values of a and b evolve to minimum investment and IpI being the maximum max higher values, where they remain. This corresponds to a investment (see Wahl and Nowak 1999a, 1999b). It is as- population of uncooperative individuals evolving into a sumedforboththesestrategiesthatthegameisplayedin population of highly cooperative individuals. Our evolu- alternatingsequence(seeRobertsandSherratt1998;Wahl tionarysimulationsconfirmtheanalyticalresultsobtained and Nowak 1999a, 1999b). above and show that these results continue to hold even Bothraise-the-stakesandthelinearreactivestrategiesde- if the populations involved are heterogeneous and the pend only on the opponent’s last move and, in this sense, game is finitely iterated. Thus, evolutionary simulations arereminiscentofstrategiessuchastitfortatinthestandard corroborate the insights obtained from the analytic ap- (noncontinuous) iterated prisoner’s dilemma. In view of proachandshowthatpayoff-basedstrategiesareageneral the success of strategies such as Pavlov for the standard androbustmeansofobtainingcooperationintheiterated game (in the simultaneous case), which depend on both continuous prisoner’s dilemma, inboththesimultaneous players’ previous moves, it is reasonable to anticipate that and alternating cases. strategies for the iterated continuous prisoner’s dilemma that depend on both players’ moves will be of similar im- portance.Inadditiontosimplydependingonbothplayers’ Discussion previousmoves,thepayoff-basedstrategiesintroducedhere We have shown, both analytically and using evolutionary have the attractive property that the amount individuals simulations,thatpayoff-basedstrategiesfortheiteratedcon- invest is dependent on how well the individual is doingat tinuous prisoner’s dilemma provide a natural framework any given moment. That is, for given strategy parameters, for understanding the evolution of cooperation via recip- if an individual’s payoff is high, it will invest more, and if rocalaltruismwhenthedegreeofcooperationcanvary.In its payoff is low, it will invest less. This general property particular,wehaveshownthatsuchstrategiesallowasimple seems to agree with our biological intuition. resolutionofthefundamentalproblemofhowcooperative Itisimportanttonotethatalthoughpayoff-basedstrat- behaviorcanevolvegraduallyfromaninitiallyselfishstate. egiesareanalogoustostrategiessuchasPavlovinthesense Inthissection,weshalldiscusshowourpayoff-basedstrat- that they depend on the previous moves of both players, egiescontrastwithotherstrategiesthathavebeenproposed they do not behave in an identical way to Pavlov. Inpar- for the iterated continuous prisoner’s dilemma. We shall ticular, after receiving the score T (i.e., Pavlov defects also discuss here whether there is any sense in which the against a cooperator), Pavlov plays D again. By contrast, evolutionofcooperationviareciprocalaltruismisfacilitated after receiving a high score (e.g., a payoff-based strategy bycooperationbeingacontinuoustrait.Finally,weconsider makes a low investment against a high-investing oppo- how we may distinguish empirically between payoff-based nent), a payoff-based strategy will increase its level of in- strategiesandstrategiesthatdependonlyontheopponent’s vestment in the next round. Thus payoff-based strategies previous move. behave in a way that is rather like tit for tat. Moreover, First,letuscontrastandcomparethepayoff-basedstrat- in such a situation, the strategy that made the low in- egies we have introduced here with the other strategies vestment will gradually increase its investment level back that have been proposed fortheiteratedcontinuouspris- toahighercooperativelevel.Therefore,withpayoff-based oner’s dilemma. These other strategies are the “raise-the- strategies occasional mistakes donotleadtoapermanent stakes”strategy(RobertsandSherratt1998)andthelinear breakdownofcooperation.Thispropertymeansthatpay- reactivestrategies(WahlandNowak1999a,1999b).These off-based strategies really behave more like generous tit
Description: