Circumventing the Price of Anarchy: Leading Dynamics to Good Behavior Maria-Florina Balcan1 AvrimBlum2 YishayMansour3 1SchoolofComputerScience,CollegeofComputing,GeorgiaInstituteofTechnology 2DepartmentofComputerScience,CarnegieMellonUniversity 3BlavatnikSchoolofComputerScience,TelAvivUniversity [email protected] [email protected] [email protected] Abstract: Many naturalgamescan have a dramatic differencebetween the qualityof theirbest and worst Nash equilibria, even in pure strategies. Yet, nearly all work to date on dynamics shows only convergence to some equilibrium, especially within a polynomial number of steps. In this work we study how agents with some knowledge of the game might be able to quickly (within a polynomial number of steps) find their way to states of quality close to the best equilibrium. We consider two natural learning models in which players choose between greedy behavior and following a proposed goodbutuntrustedstrategyandanalyzetwoimportantclassesofgamesinthiscontext,faircost-sharing andconsensusgames.BothgameshaveextremelyhighPriceofAnarchyandyetweshowthatbehavior inthesemodelscanefficientlyreachlow-coststates. Keywords: DynamicsinGames,PriceofAnarchy,PriceofStability,Cost-sharinggames,Consensus games,Learningfromuntrustedexperts 1 Introduction onfast convergenceof bothbestresponseandre- gret minimization dynamics to states with cost Therehasbeensubstantialworkinthemachine comparable to the Price of Anarchy of the game learning, game theory, and (more recently) algo- [4,15,26,28],whetherornotthatstateisanequi- rithmic game theory communitieson understand- librium. Thislineofworkisjustifiedbythereal- ing the overall behavior of multi-agent systems ization that in many cases the equilibrium nature in which agents follow natural learning dynam- ofthesystemitselfislessimportant,andwhatwe ics suchas (randomized)best/betterresponseand care more about is having the dynamics reach a no-regretlearning. Forexample, itis wellknown low-coststate;thisisapositionweadoptaswell. that in potential games, best-response dynamics, in which players take turns each making a best- However,whiletheaboveresultsarequitegen- response move to the current state of all the oth- eral, the behavior or equilibrium reached by the ers, is guaranteed to converge to a pure-strategy givendynamicscouldbeasbadastheworstequi- Nash equilibrium [24, 27]. Significant effort has libriuminthegame(theworstpure-strategyNash been spent recently on analyzing various proper- equilibriuminthecaseofbest-responsedynamics tiesofthesedynamicsandtheirvariations,inpar- inpotentialgames,andthe worstcorrelatedequi- ticular on their convergencetime [1, 26]. No re- libriumforno-regretalgorithmsingeneralgames). gret dynamics have also been long studied. For Even for specific efficient algorithms and even example,a wellknowngeneralresultthatapplies fornatural potentialgames, in generalno bounds to any finite game is that if players each follow better than the pure-strategy price of anarchy are a “no-internal-regret”strategy, then the empirical known. On the other hand, many important po- distribution of play is guaranteedto approachthe tentialgames,includingfaircost-sharingandcon- setofcorrelatedequilibriaofthegame[17–19]. sensusgames,canhaveveryhigh-costNashequi- There has also been a lot of attention recently libriaeventhoughtheyalsoalwayshavelow-cost 1 equilibria as well; that is, their Price of Anarchy solutionsmightbeknownbecausepeopleanalyz- isextremelyhighandyettheirPriceofStabilityis ing the specific instance being played might dis- low(wediscussthesegamesmoreinSection1.1). cover and publish low cost global behaviors. In Thus,aguaranteecomparablewiththecostofthe this case, individualplayersmightthen occasion- worst Nash equilibrium can be quite unsatisfac- ally test out their parts of these behaviors, using tory. them as extra inputs to their own learning algo- rithmoradaptivedynamicstoseeiftheydoinfact Unfortunately, in general there have been very providebenefittothemselves.Thequestionthenis fewresultsshowingnaturaldynamicsthatleadto canthisprocessallowlowcoststatestobereached low cost equilibria or behavior in games of this andinwhatkindsofgames? type, especially within a polynomial number of steps. Forpotentialgames, itis knownthatnoisy Motivatedbythisquestion,inthispaperwede- best-responseinwhichplayersactina“simulated- velop techniques for understanding and influenc- annealing”stylemanner,alsoknownaslog-linear ing the behavior of natural dynamics in games learning, does have the property that the states with multiple equilibria, some of which may be of highest probability in the limit are those min- ofmuchhighersocialqualitythanothers. Inpar- imizing the global potential [8, 9, 23]. In fact, ticular we consider a model in which a low cost as temperature approaches 0, these are the only global behavior is proposed, but individual play- stochastically-stablestates. However,aswe show ers do not necessarily trust it (it may not be an in Appendix A.2, reaching such states with non- equilibrium, and even if it is they do not know if negligibleprobabilitymassmayeasilyrequireex- others will follow it). Instead, players use some ponential time. For polynomial-time processes, formof expertslearningalgorithmwhere oneex- Charikaretal.[11]shownaturaldynamicsrapidly pert says to play the best response to the current reaching low-cost behavior in a special case of stateandanothersaystofollowtheproposedstrat- undirectedfaircost-sharinggames;however,these egy.Ourmodelimposesonlyverymildconditions results require specialized initial conditions and on the learning algorithm each player uses: dif- alsodonotapplytodirectedgraphs. ferentplayers may use completely differentalgo- rithms for deciding among or updatingprobabili- In this paper we initiate a study of how to aid ties on their two high-levelchoices. Assume that dynamics, beginning from arbitrary initial condi- theplayersmoveinarandomorder. Willthispro- tions,toreachstatesofcostclosetothebestNash duce a low-cost solution (even if it doesn’t con- equilibrium. We propose a novel angle on this vergetoanything)? Weconsidertwovariationsof problem by considering whether providing more thismodel:alearn-then-decidemodelwhereplay- information to simple learning algorithms about ers initially follow an “exploration” phase where the game being played can allow natural dynam- theyputroughlyequalprobabilityoneachexpert, ics to reach such high quality states. At a high followed by a “commitment” phase where based level, there are two barriers to simple dynamics on their experience they choose one to use from performing well. One is computational: for di- then on, and a smoothly adaptive model where rectedcost-sharinggamesforinstance, we donot they slowly change their probabilities over time. even know of efficient centralized proceduresfor Within each we analyze severalimportantclasses findinglow-coststatesingeneral.Asanoptimiza- of games. In particular, our main results are that tion problem this is the Directed Steiner Forest for both fair cost-sharing and consensus games, problemandthebestapproximationfactorknown theseprocessesleadtogoodqualitybehavior. is min(n1/2+ǫ,N4/5+ǫ,m2/3Nǫ) where n is the numberofplayers,N isthenumberofverticesand Ourstudyismotivatedbyseverallinesofwork. m is the number of edges in the graph [12, 16]. One is the above-mentioned work on noisy best- Theotherbarrierisincentive-based:evenifalow- responsedynamicswhichreachhigh-qualitystates costsolutionwereknown,therewouldstillbethe butonlyafterexponentiallymanysteps.Anotheris issue of whetherplayers would individuallywant workonthevalueofaltruism[29]whichconsiders to play it, especially withoutknowing what other how certain players acting altruistically can help playerswill choose to do. For example, low-cost thesystemreachabetterstate. Thelastisthework 2 in[5]whichconsidersacentralauthoritywhocan Figure1Adirectedcost-sharinggame,thatmod- temporarilycontrolarandomsubsetoftheplayers els a setting where each player can choose either in orderto get the system into a goodstate. That todriveitsowncartoworkatacostof1,orshare modelisrelatedtothemodelweconsiderhere,but publictransportationwithothers,splittinganover- is much more rigid because it posits two classes allcostofk. Forany1 < k < n,ifplayersarrive ofplayers(onethatfollowsthegiveninstructions oneatime andeachgreedilychoosesapathmin- andonethatdoesn’t)anddoesnotallowplayersto imizing its cost, then the cost of the equilibrium adaptivelydecideforthemselves.(SeeSection1.2 obtainedisn,whereasOPThascostonlyk. formorediscussion). 1.1 OurResults As described above we introduce and analyze two models for guiding dynamics to good equi- libria, and within these models we prove strong positiveresultsfortwoclassesofgames,faircost sharingandconsensusgames.Inn-playerfaircost sharinggames,playerschooseroutesinanetwork andsplitthecostofedgestheytakewithothersus- ingthesameedge;seeSection3foraformaldef- inition. Thesegamescanmodelscenariossuchas whethertodriveone’sowncartoworkortoshare publictransportationwithothers(seeFigure1for show that so long as players place probability asimpleexample)andcanhaveequilibriaasmuch β > 1/2 on the proposed optimal strategy, then as a factor of n worse than optimal even though withhighprobabilityplaywillreachtheexactopti- theyarealwaysguaranteedtohavelow-costequi- malbehaviorwithinapolynomialnumberofsteps. libriathatareonlyO(logn)worsethanoptimalas Moreover for certain natural graphs such as the well. In consensusgames, playersare nodes in a lineandthegrid,anyβ >0issufficient.Notethat networkwhoeachneedtochooseacolor,andthey ourresultsareactuallystrongerthanthoseachiev- payacostforeachneighborofdifferentcolorthan able in the more centralized model of [5], where theirown(seeSection4). Theseagaincanhavea oneneedstomakecertainminimaldegreeassump- widegapbetweenworstandbestNashequilibrium tionsonthegraphaswell. (inthiscase,costΩ(n2)fortheworstversus0for Inbothourmodelsitisaneasyobservationthat thebest). for any game, if the proposed solution is a good Our main results for these games are the equilibrium,theninthelimitthesystemwilleven- following. For fair cost-sharing, we show tually reach the equilibrium and stay there indef- that in the learn-then-decide model, so long initely. Our interest, however, is in polynomial- as the exploration phase has sufficient (polyno- timebehavior. mial) length and the proposed strategy is near- optimal, the expected cost of the system reached 1.2 RelatedWork is O(log(n)log(nm)OPT). Thus, this is only DynamicsandConvergenceto Equilibria: It is slightly largerthan that given by the price of sta- wellknownthatinpotentialgames,best-response bility of the game. For the smoothly-adaptive dynamics, in which players take turns each mak- model, if there are many players of each type ing a best-response move to the current state of (i.e., associated to each (s ,t ) pair) we can do alltheothers,isguaranteedtoconvergetoapure- i i even better, with high probability achieving cost strategyNashequilibrium[24,27]. Significantef- O(log(nm)OPT),orevenO(OPT)ifthenum- fort has been spent recently on the convergence ber of players of each type is high enough. Note time of these dynamics [1, 26], with both exam- that with many players of each type the price of ples of games in which such dynamics can take anarchy remains Ω(n) though the price of sta- exponential time to converge, and results on fast bility becomes O(1). For consensus games, we convergenceto stateswithcostcomparabletothe 3 PriceofAnarchyofthegame[4,15]. implies that in a two phase model, where in the first phase the jobsjoin one at a time and use the Anotherwell known generalresult that applies greedy algorithm (or that of [3]), and in the sec- to any finite game is that if players each follow ondphasetheyperformabestresponsedynamics, a “no-internal-regret”strategy, then the empirical achievescostwithinafactorO(m)(orO(logm)) distribution of play is guaranteedto approachthe of optimal. This is in contrast to the unbounded setofcorrelatedequilibriaofthegame[17–19]. In PriceofAnarchyingeneral. particular, for good no-regretalgorithms, the em- piricaldistributionwillbean ǫ-correlatedequilib- Taxation:Therehasalsobeenworkonusingtaxes rium after O(1/ǫ2) rounds of play [7]. A recent to improvethe qualityof behavior[13, 14]. Here result of [20] analyzesa version of the weighted- the aim is via taxes to adjust the utilities of each majorityalgorithmin congestiongames, showing player, such that the only Nash equilibria in the thatbehaviorconvergestothesetofweakly-stable new game correspond to optimal or near-optimal equilibria. This implies performance better than behaviorintheoriginalgame. Incontrast,ourfo- the worst correlated equilibrium, but even in this cus is to identify how dynamics can be made to casetheguaranteeisnobetterthantheworstpure- reachagoodresultwithoutchangingthegameor strategyNashequilibrium. adjustingutilities,butratherbyinjectingmorein- formationintothesystem. Noisybest-responsehasbeenshowninthelimit to reach states of minimum global potential [8, Public service advertising: Finally, the public 9, 23], and thus provide strong positive results in serviceadvertisingmodelof[5]alsousestheidea gameswithasmallgapbetweenpotentialandcost. of a proposed strategy in order to move players However, this convergence may take exponential into a good equilibrium. In the model of [5], the time, even for fair cost-sharing games. In partic- players only once select (randomly)between fol- ular we show in Appendix A.2 that if one makes lowingthe proposedstrategyornot. Playersthen manycopiesoftheedgesofcost1intheexample stick with their decision while those that decided ofFigure1,reachingastatewithsufficientlymany nottofollowtheproposedstrategysettleonsome playersonthesharedpathtoinduceotherstofol- equilibrium for themselves (given the other play- low along would take time exponentialin k even ers actions are fixed). Then, in the last phase all ifthealgorithmcanarbitrarilyvaryitstemperature playersperforma bestresponsedynamicstocon- parameter.FordetailsseeAppendixA.2. verge to a final equilibrium (the convergence is Analternativelineofworkassumesthatthesys- guaranteedsincethediscussionislimitedtopoten- tem starts empty and players join one at a time. tialgames). Incontrast,inourmodelstheplayers Charikar et al. [11] analyze fair cost sharing in continuously randomly re-select between follow- thissettingonanundirectedgraphwhereallplay- ingtheproposedstrategyorperformingabestre- ers have a common sink. Their model has two sponse. Thiscontinuousrandomizationmakesthe phases: inthefirst,playersenteroneatatimeand analysis of our dynamics technically more chal- use a greedy algorithm to connect to the current lenging. Conceptually,thebenefitofourmodelis tree, and in the secondphase the playersundergo thattheplayersare“symmetric”andcancontinu- best-response dynamics. They show that in this ously switch between the two alternatives, which case a good equilibrium (one that is within only better models selfish behavior. This continuous a polylog(n) factor of optimal) is reached. We processiswhatenablestheplayerstobothexplore discuss this result furtherin AppendixA. We re- andexploitthetwoalternativeactions. markherethatfordirectedgraphs,itiseasytocon- structsimpleexampleswherethisprocessreaches 2 AFormal Framework anequilibriumthatisΩ(n)fromoptimal(seeFig- 2.1 NotationandDefinitions ure1). Forschedulingonunrelatedmachineswith makespan cost function, the natural greedy algo- Westartbyprovidinggeneralnotationsanddef- rithmtoassignincomingjobsisanO(m)approx- initions.Agameisdenotedbyatuple imationandamoresophisticatedonlinealgorithm guarantees an O(logm) approximation [3]. This G =hN,(S ),(cost )i i i 4 whereN isasetofnplayers,S isthefiniteaction 2.2 TheModel i spaceofplayeri∈N,andcost isthecostfunc- i Wenowdescribetheformalmodelthatwecon- tionofplayeri. Thejointactionspaceoftheplay- sider. Initially, players begin in some arbitrary ersisS =S ×...×S . Forajointactions∈S 1 n state, which could be a high-cost equilibrium or wedenotebys−itheactionsofplayersj 6=i,i.e., evenastate thatisnotanequilibriumatall. Next s−i = (s1,...,si−1,si−1,...,sn). The cost func- an entity, perhapsthe designerof the system or a tion of player i maps a joint action s ∈ S to a playerwhohasstudied thesystem well, proposes realnon-negativenumber,i.e., cost : S → R+. i somebetterglobalbehaviorB. B mayormaynot Every game has associated a social cost function cost : S → R that maps a joint action to a real be an equilibrium behavior, our only assumption isthatithavelowoverallcost. Now,playersmove value. Inthe casesdiscussedinthispapertheso- oneatatimeinarandomorder.Eachplayer,when cialcostissimplythesumofplayers’costs,i.e., it is their turn to move, chooses among two op- n tions. Thefirstistosimplybehavegreedilyandto cost(s)=Xcosti(s). makeabest-responsemovetothecurrentconfigu- i=1 ration. Thesecondoptionistofollowtheirpartof Theoptimalsocialcostis thegivenbehaviorB. Thesetwohigh-levelstrate- OPT(G)=mincost(s). gies(best-responseorfollowB)areviewedastwo s∈S “experts” and the player then runs some learning WesometimesoverloadnotationanduseOPTfor algorithm aiming to learn which of these is most ajointactionsthatachievescostOPT(G). suitable for himself. Note that best-response is Givenajointactions,theBestResponse(BR)of an abstract option — the specific action it corre- playeriisthesetofactionsBRi(s)thatminimizes spondstomaychangeovertime. its cost, given the other players actions s−i, i.e., Becausemovesoccurinanasynchronousman- BRi(s−i)=argmina∈Sicosti(a,s−i). ner, there are multiple reasonable ways to model Ajointactions∈S isapureNashEquilibrium the feedback each player gives to its learning al- (NE)if no player i ∈ N can benefit from unilat- gorithm: forinstance,doesitconsidertheaverage erally deviating to another action, namely, every cost since the player’s previous move or just the player is playing a best response action in s, i.e., cost when it is the player’s turn to go, or some- si ∈ BRi(s−i)foreveryi ∈ N. Abestresponse thing in between? In addition, does it get to ob- dynamicsis a processin whichateach time step, servethecostoftheactionitdidnottake(thefull some playerwhich is notplaying a best response informationmodel)oronlytheactionitchose(the switchesitsactiontoabestresponseaction,given banditmodel)? Toabstractawaytheseissues,and thecurrentjointaction. Inthispaperwefocuson evenallowdifferentplayerstoaddressthemdiffer- potentialgames,whichhavethepropertythatany ently,weconsiderheretwomodelsthatmakeonly bestresponsedynamicsconvergesto a pureNash verymildassumptionsonthekindoflearningand equilibrium[24]. adaptationmadebyplayers. Let N(G) be the set of Nash equilibria of the LearnthenDecidemodel: In this model, play- game G. The Price of Anarchy (PoA) is defined ersfollowan“exploration”phasewhereeach astheratiobetweenthe maximumcostofaNash time it is their turn to move, they flip a coin equilibriumandthesocialoptimum,i.e., todecidewhethertofollowtheproposedbe- max cost(s)/OPT(G). havior B or to do a best-response move to s∈N(G) thecurrentconfiguration.Weassumethatthe The Price of Stability (PoS) is the ratio between coingivesprobabilityatleastβtoB,forsome the minimum cost of a Nash equilibrium and the constantβ > 0. Finally,aftersomecommon socialoptimum,i.e., time T∗, all players switch to an “exploita- min cost(s)/OPT(G). tion”phasewheretheyeachcommitinanar- s∈N(G) bitrarywaybasedontheirpastexperienceto Foraclassofgames,thePoAandPoSarethemax- followB orperformbestresponsefromthen imumoverallgamesintheclass on.(ThetimeT∗isselectedinadvance.) 5 The above model assumes some degree of coor- otheralternative)or eventhe sum total cost since dination: a fixed time T∗ after which all players itspreviousmove,thenBmightappearbetter.Our make their decisions. One could imagine instead modelallowsuserstoupdateinanywaytheywish, each player i having its own time T∗ at which it solongastheupdatesaresufficientlygradual. i commits to one of the experts, perhaps with the Finally,asmentionedintheintroduction,inboth time itself depending on the player’s experience. our models it is an easy observation that for any In the Smoothly-Adaptive model below we even game, if the proposed solution is a good equilib- more generally allow players to smoothly adjust rium,theninthelimit(asT∗ → ∞intheLearn- their probabilities over time as they like, subject then-Decidemodeloras ∆ → 0 in theSmoothly onlytoaconstraintontheamountbywhichprob- Adaptivemodel)the system will eventuallyreach abilitiesmaychangebetweentimesteps. theequilibriumandstaythereindefinitely.Ourin- SmoothlyAdaptivemodel: In this model, there terest,however,isinpolynomial-timebehavior. are no separate exploration and exploitation phases. Instead, each player i maintainsand 3 Fair CostSharing adjusts a value p over time. When player i i Thefirstclassofgameswe studyinthispaper, is chosen to move, it flips a coin of bias p i becauseofitsrichstructureandwidegapbetween toselectbetweentheproposedbehaviorora price of anarchy and price of stability, is that of best-response move, choosing B with prob- faircostsharinggames. Thesegamesaredefined ability p . We allow the players to use ar- i as follows. We are given a graph G = (V,E), bitraryadaptivelearningalgorithmstoadjust which can be directed or undirected, where each these probabilities with the sole requirement edgee ∈ E hasanonnegativecostc ≥ 0. There thatlearningproceedslowly.Specifically,us- e isasetN = {1,...,n}ofnplayers,whereplayer ingpt todenotethevalueofp attimet, we i i iisassociatedwithasources andasinkt . The requirethat i i strategysetofplayeriisthesetS ofs −t paths. i i i |pt−pt+1|≤∆ Inanoutcomeofthegame,eachplayerichooses i i a single pathP ∈ S . Givena vectorof players’ i i fora sufficiently (polynomially)small quan- strategiess = (P ,...,P ),letx bethenumber 1 n e tity∆, andfurthermorethatforall i,theini- of agents whose strategy contains edge e. In the tialprobabilityp0 ≥p0forsomeoverallcon- faircostsharinggamethecosttoagentiis i stant 0 < p0 < 1. Note that the algorithm c mayupdatepi evenin time steps atwhich it costi(s)= X e x does notmove. The learningalgorithm may e∈Pi e useanykindoffeedbackorweight-updating andthegoalofeachagentistoconnectitstermi- strategyitwantsto(e.g.,gradientdescent[30, 31],multiplicativeupdating[10,21,22])sub- nals with minimum total cost. The social cost of jecttothisboundedstep-sizerequirement. anoutcomes=(P1,...,Pn)isdefinedtobe Wesaythattheprobabilitiesare(T,β)-good ifforanytimet ≤ T wehaveforalli,pt > cost(P1,...,Pn)= X ce. i β. (Notethatif∆<(p0−β)/T thenclearly e∈∪iPi theprobabilitiesare(T,β)-good.) It is well known that fair cost sharing games are Wepointoutthatwhileonemightatfirstthinkthat potentialgames[2,24]andthepriceofanarchyin anynaturaladaptivealgorithmwouldlearntofavor thesegamesisΘ(n)whilethepriceofstabilityis best-response(alwaysdecreasingpi),thisdepends H(n)[2],whereH(n) = Pni=11/i = Θ(logn). onthekindoffeedbackituses. Forinstance,ifthe Inparticular,thepotentialfunctionforthesegames algorithmconsidersonlyitscostimmediatelyafter is it moves, then indeed by definition best-response xe willappearbetter. However,ifitconsidersitscost Φ(s)= XXce/x, immediately before it moves (comparing that to e∈Ex=1 what its cost would have been had it chosen the whichsatisfiesthefollowinginequality: 6 Fact1 In fair cost sharing, for any s ∈ S we 3.1 TheMainArguments have:cost(s)≤Φ(s)≤H(n)·cost(s). We beginwith the followingkeylemma thatis usefulintheanalysisofbothlearn-then-decideand Forease of notation, we assume in this section smoothlyadaptivemodels. thattheproposedstrategyBisthesociallyoptimal behaviorOPT, so we canidentifyPOPT = PB i i Lemma2 Consider a fair cost sharing game. as the behavior proposed by B to player i. If B ThereexistsaT = poly(n),suchthatiftheprob- is different from OPT, then we simply lose the abilitiesare(T,β)-goodforconstantβ > 0,then correspondingapproximationfactor. withhighprobabilitythecostattime T willbeat mostO(OPTlog(mn)). OverviewoftheResultsandAnalysis: Ourmain Moreover,ifwehaveatleastclog(nm)players results for fair cost sharing are the following. If ofeach(s ,t ) pairfor sufficientlylarge constant i i we have many players of each type (the type of c,thenwithhighprobabilitythecostattimeT will a player is determined by its source s and desti- beO(OPT). i nation t ) then in both the learn-then-decide and i smoothlyadaptive models we can show that with Proof: We begin with the general case. Let nopt highprobability,behaviorwillreachastateofcost e denote the number of players who use edge e in withinalogarithmicfactorofOPTwithinapoly- OPT. We partition edges into two classes. We nomialnumberofsteps. Moreover,forthe learn- say an edge is a “high traffic” edge if nopt > then-decide model, even if we do not have many e clog(nm) where c is a sufficiently largeconstant playersofeachtypewecanshowthattheexpected (c = 32/β suffices for the argumentbelow). We costattheendoftheprocesswillbelow. sayitisa“lowtraffic”edgeotherwise. The high level idea of the analysis is that we DefineT =2nlogn.Withhighprobability,by 0 firstprovethatso longaseachplayerrandomizes time T each player has had a chance to move at 0 with probability near to 50/50, with high proba- leastonce.Weassumeinthefollowingthatthisin- bility the overall cost of the system will drop to deedisthecase.Notethatasacrudebound,atthis within a logarithmic factor of OPT in a poly- pointthe costof the system is atmostn2 ·OPT nomial number of steps; moreover, at that point (each player will move to a path of cost-share at boththebestresponseandtheproposedactionsare most OPT and therefore of actual cost at most pretty good strategies from the individualplayers n·OPT).Next,byChernoffboundsandtheunion pointofview. Tofinishtheanalysisinthe“Learn bound,ourchoiceofcimpliesthatwithhighprob- then Decide” model, we show that in the remain- abilityeachhigh-trafficedgeehasatleastβnopt/2 e ingstepsoftheexplorationphasetheexpectedcost playerson it at all time steps T ∈ [T ,T +n3]; 0 0 doesnotincreasebymuch;usingpropertiesofthe in particular, Chernoff boundsimply that each of potential function, we then show that in the final the at most mn3 events has probability at least “decision” round T∗, the overallpotential cannot 1 − e−βnoept/8 ≥ 1 − 1/(mn)4. In the remain- increasesubstantiallyeither,whichinturnimplies inganalysis, we assumethisindeedis thecase as aboundontheincreaseinoverallcost. well. Fortheadaptivemodel,onekeydifficultyinthe LetOPT denotethecostofplayeriinOPT, i analysis is to show that if the system reaches a so that OPT = OPT . Our assumption Pi i statewherethesocialcostislowandbothabstract aboveimpliesthatforanytimestepT undercon- actions are pretty good for most players, the cost sideration,ifplayerifollowstheproposedstrategy never goes high again. We are able to show that POPT, its cost will be at most clog(nm)OPT . i i this indeed is the case as long as there are many In particular, its cost on the low-traffic edges in players of each type, no matter how the players POPT can be at most a factor clog(nm) larger i adjust their probabilities pt or make their choice than its cost on those edges under OPT, and its i betweenbest-responseandfollowingtheproposed cost on high-traffic edges is at most a 2/β factor behavior. largerthanitscostonthoseedgesunderOPT. 7 Wenowargueasfollows.Letcost denotethe whichisnotpossiblesincebydefinitionofX we T T costofthesystemattimeT.If have costT ≥2clog(nm)OPT, ΦT ≤Φ0+(XT −X0)−TQ/n thentheexpectedcostofarandomplayerisatleast which would be negative. Therefore, with high probability stopping must occur before this time 2clog(nm)OPT/n. asdesired. Finally,ifwehaveatleastclog(mn)playersof Ontheotherhand,ifplayeriischosentomoveat each type, then there are nolow-trafficedgesand time T, from the above analysis its cost after the so we donotneedto lose the clog(mn)factorin move(whetheritchoosesBorbestresponse)will be at most clog(nm)OPT . The expectedvalue theargument. i ofthisquantityoverplayersichosenatrandomis We nowpresentasecondlemmawhichwillbe atmostclog(nm)OPT/n. Therefore,if usedtoanalyzethe“LearnthenDecide”model. cost ≥2clog(nm)OPT, T Lemma3 Consider a fair cost sharing game in the Learn-then-Decide model. If the cost of the theexpecteddropinpotentialattimeT isatleast systemattimeT isO(OPTlog(mn)),andT = 1 clog(nm)OPT/n. T1+poly(n)<T∗,thentheexpectedvalueofthe potentialattimeT isO(OPTlog(mn)log(n)). Finally,sincethecostattimeT wasatmostn2· 0 OPT, which implies by Fact 1 the value of the Proof: First, as argued in the proof of Lemma potentialwas at most n2(1+log(n))OPT, with 2, with high probability for any player i and any highprobabilitythiscannotcontinueformorethan time t ∈ [T ,T], the cost for player i to fol- 1 O(n3) steps. Formally, we can apply Hoeffding- low the proposed strategy at time t is at most Azumaboundsforsupermartingalesasfollows:let clog(nm)OPT for some constant c. Let us as- i usdefineQ=clog(nm)OPTand sumebelowthatthisisindeedthecase. Next, the above bound implies that if the cost ∆T =max(ΦT −ΦT−1+Q/n,−2Q) at time t ∈ [T1,T] is costt, then the expected decrease in potential caused by a random player and consider running this process stopping when moving(whetherfollowing the proposedstrategy cost <2Q. Let T orperformingbestresponse)isatleast XT =Φ0+∆1+...+∆T. (cost −clog(nm)OPT)/n; t Thenthroughouttheprocesswehave in particular, cost /n is the expected cost t E[XT|X1,...,XT−1]≤XT−1 ocfloga(nmran)OdoPmT)p/lnayiesranbeufpopreerbiotsundmoonvet,heaenxd- and pectedcostofarandomplayerafteritsmove.Note |XT −XT−1|≤2Q, thatthisis an expectationoverrandomnessin the choiceofplayerattimet,conditionedonthevalue wherethefirstinequalityholdsbecauseouranaly- of cost . In particular, since this holds true for sisshowinganexpecteddecreaseinpotentialofat t anyvalueof cost , we can take expectationover leastQ/nistrueevenifwecapalldecreasestoa t the entire process from time T up to t, and we maximumof2Qasinthedefinitionof∆ . So,by 1 T havethatif Hoeffding-Azuma (see Theorem 10 in Appendix B),aftern3 stepsinthenon-stoppedprocesswith E[cost ]≥clog(nm)OPT t highprobabilitywewouldhave then 1 XT −X0 ≤ 2n2Q, E[Φt+1]≤E[Φt], 8 whereΦ isthevalueofthepotentialattimestep the total increase in potential after time T∗ is at t t. mostO(OPTlogn). Finally, since Φ ≤ log(n)cost , this implies Since after all players have committed, poten- t t thatif tial can only decrease, this implies that the ex- pectedvalueofthepotentialatanytimeT′ ≥ T∗ E[Φt]≥clog(nm)log(n)OPT is O(OPTlog(mn)log(n)). Therefore, the ex- pectedcostattimeT′isatmostthismuchaswell, then asdesired. E[Φ ]≤E[Φ ]. t+1 t WenowuseLemma2toanalyzefaircostshar- Sinceforanyvalueofcost wealwayshave t inggamesintheadaptivelearningmodelwhenthe E[Φ ]≤E[Φ ]+clog(nm)OPT/n, numberofplayersni ofeachtype(i.e.,associated t+1 t toeach(s ,t )pair)islarge. i i thisinturnimpliesbyinductionontthat Theorem5 Considerafaircostsharinggamein E[Φ ]≤ theadaptivelearningmodelsatisfyingn =Ω(m) t i clog(nm)log(n)OPT+clog(nm)OPT/n for all i. There exists a T1 = poly(n) such that if the probabilities are (T ,β)-good for constant 1 forallt∈[T1,T],asdesired. β > 0,thenwithhighprobability,forallT ≥ T1 thecostattimeT isO(log(nm)OPT). Our main result in the “Learn then Decide” Moreover, there exists constant c such that if modelisthefollowing: n ≥max[m,clog(mn)]thenwithhighprobabil- i ity,forallT ≥T thecostattimeT isO(OPT). Theorem4 For fair cost sharing games in the 1 Learn then Decide model, a polynomial num- Proof:First,byLemma2wehavethatwithhigh ber of exploration steps T∗ is sufficient so that probabilityatsometime T ≤ T , thecostofthe the expected cost at any time T′ ≥ T∗ is 0 1 systemreachesO(OPTlog(mn)). Thekeytothe O(log(n)log(nm)OPT). argument now is to prove that once the cost be- comeslow,itwillneverbecomehighagain.Todo Proof: FromLemma2, thereexists T = poly(n) this we use the fact that n is large for all i. Our suchthatwithhighprobabilitythecostofthesys- i tem will be at most O(OPTlog(mn)) at some argumentwhichfollowstheproofofTheorem4.3 time T ∈ [T∗ − T,T∗]. From Lemma 3, this in[6]isasfollows. 1 Let U be the set of all edges in use at time T impliestheexpectedvalueofthepotentialattime 0 T∗ right before the final exploitation phase is alongwithalledgesusedinB. Ingeneral,wewill O(OPTlog(mn)log(n)). Finally, we consider insert an edge into U if it is ever used from then the decisions made at time T∗. When a player on in the process, andwe neverremoveanyedge fromU,evenifitlaterbecomesunused. Letc∗ be chooses to make a best-response move, this can thetotalcostofalledgesinU. So,c∗ isanupper only decrease potential. When a player chooses boundonthecostofthecurrentstateandtheonly B,thiscouldincreasepotential. However,forany wayc∗ canincreaseisbysomeplayerchoosinga edge e in the proposed solution B, the total in- bestresponsepaththatincludessomeedgenotin creaseinpotentialcausedbyedgeeoverallplay- U. Now,noticethatanytimeabest-responsepath erswhohaveeintheirproposedsolutionisatmost forsome(s ,t )playerusessuchanedge,thetotal i i ce·H(noept)=O(celogn). costofalledgesinsertedintoU isatmostc∗/ni, because the current player can always choose to Thisisbecausewheneveranewplayermakesade- take the path used by the least-cost player of his cision to commit to following the proposed strat- type and those n players are currently sharing a i egy and using edge e, all previous players who totalcostofatmostc∗. Thus,anytimenewedges made that commitment after time T and whose are added into U, c∗ increases by at most a mul- proposedstrategyusesedgeearestillthere.Thus, tiplicative(1+1/n )factor. We caninsertedges i 9 intoU atmostmtimes,sothefinalcostisatmost ofΩ(n2).1 Thisissubstantiallyworsethanoptimal (eitherbyanΩ(n2)orinfinitefactor,dependingon cost(T0)(1+1/ni∗)m, whetheroneallowsanadditiveoffsetornot). Unlike in the case of fair cost-sharing games, whereni∗ = minini. Thisimpliesthataslongas for consensus games there is no hope of quickly ni∗ = Ω(m) we have cost(T) = O(cost(T0)) achievingnear-optimalbehaviorfor allβ > 0. In forallT ≥T .Thus,overallthetotalcostremains particular, for any β < 1/2 there exists α > 0 0 O(OPTlog(mn)). in the example above such that if players choose Ifn ≥ max[m,clog(mn)], wesimplyusethe the proposed strategy B with probability β, with i improvedguaranteeprovidedby Lemma 2 to say high probability all nodes have more neighbors that with high probability the cost of the system performingbestresponsethanfollowingB. Thus, reachesO(OPT)withinT ≤ T timesteps,and it is easy to see by induction that no matter what 0 1 thenapplythechargingargumentasaboveinorder strategyBis,allbest-responseplayerswillremain togetthedesiredresult. withtheirinitialcolor.Moreover,thisremainstrue for exponentiallyin n many steps.2 On the other hand,thisargumentbreaksdownforβ > 1/2. In Variations: Onecouldimagineavariationonour fact,wewillshowthatforβ >1/2,foranygraph frameworkwhereratherthanproposingafixedbe- G, thesystemwillinpolynomialtimereachopti- havior B, one can propose a more complex strat- malbehavior(ifB =OPT). egysuchas“followB untiltimestepT∗ andthen It is interesting to compare this with the cen- follow B′” or “follow B until time step T∗ and tralized model of [5] in which a central authority then perform best-response”. In the latter case, takescontrolofa randomconstantfractionofthe the Smoothly-Adaptive model with (T∗,β)-good players and aims to use them to guide the selfish probabilitiesbecomesessentiallyaspecialcaseof behavior of the others. In that setting, even sim- theLearn-then-Decidemodelandallresultsabove ple low-degree graphs can cause a problem. For for the Learn-then-Decide model go through im- instance, consider a graph G consisting of n/4 mediately. On the other hand, this type of strat- 4-cycleseach in the initialequilibriumconfigura- egyrequiresaformofglobalcoordinationthatone tion red,red,blue,blue. If the authoritycontrols wouldprefertoavoid. a random β fraction of players (for any constant β <1),withhighprobabilityaconstantfractionof the 4-cycles contain no players under the author- 4 ConsensusGames ity’s control and will remain in a high-cost state. Anotherinterestingclassofgamesweconsider Ontheotherhand,itiseasytoseethatthisspecific isthatofconsensusgames. Here,playersarever- examplewillperformwellinbothourlearn-then- tices in an n-vertex graph G, and each have two decideorsmoothly-adaptivemodels.Weshowthat choices of action, red or blue. Players incur a in fact all graphsG performwell, thoughthis re- cost equal to the number of neighbors they have quirescareintheargumentduetocorrelationsthat ofdifferentcolor,andtheoverallsocialcostisthe mayarise. sum of costs of each player. The potential func- We assumethatthe proposedbehavior B isthe tionforconsensusgamesissimplyhalfthesocial optimal behavior “all blue” and prove that with costfunction,orequivalentlythenumberofedges high probability the configuration will reach this havingendpointsofdifferentcolor. Whileinthese stateafterO(nlog2n)steps. Webeginwithasim- games optimal behavior is trivial to describe (all redorallblueforatotalcostof0)andisanequi- 1Intuitivelyonecanthinkofthisexampleastwocountries, oneusingEnglishunitsandoneusingthemetricsystem,nei- librium, they also have extremely poor equilibria therwantingtoswitch. aswell. Forexampleconsidertwocliquesof n/2 2Ofcourse,inthelimitasthenumberoftimestepsT → vertices each, with each node also having αn/2 ∞eventuallywewillobserveasequenceinwhichallplayers neighborsintheothercliqueforsome0<α<1. selectB,andifBisanequilibriumthesystemwillthenremain there forever. Thus if B is “all blue” the system will in the In this case, thereis a Nash equilibriumwith one limitconvergetooptimalbehavior. However,ourinterestisin cliqueredandonecliqueblue,foranoverallcost polynomialtimeconvergence. 10
Description: