ebook img

Regret Minimization and the Price of Total Anarchy PDF

20 Pages·2008·0.19 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Regret Minimization and the Price of Total Anarchy

Regret Minimization and the Price of Total Anarchy ∗ Avrim Blum MohammadTaghiHajiaghayi† KatrinaLigett‡ CarnegieMellon AT&TLabs–Research CarnegieMellon Pittsburgh, PA FlorhamPark,NJ Pittsburgh, PA [email protected] [email protected] [email protected] AaronRoth CarnegieMellon Pittsburgh, PA [email protected] March7,2008 Abstract Weproposeweakeningtheassumptionmadewhenstudyingthepriceofanarchy: Ratherthanassumethatself- interestedplayerswillplayaccordingtoaNashequilibrium(whichmayevenbecomputationallyhardtofind),we assumeonlythatselfishplayersplaysoastominimizetheirownregret.Regretminimizationcanbedoneviasimple, efficientalgorithmseveninmanysettingswherethenumberofactionchoicesforeachplayerisexponentialinthe natural parameters of the problem. We prove that despite our weakened assumptions, in several broad classes of games,this“priceoftotalanarchy”matchestheNashpriceofanarchy,eventhoughplaymayneverconvergetoNash equilibrium. Incontrasttothepriceofanarchyandtherecentlyintroducedpriceofsinking[15],whichrequireall playerstobehaveinaprescribedmanner, weshowthatthepriceoftotalanarchyisinmanycasesresilienttothe presence of Byzantine players, about whom we make no assumptions. Finally, because the price of total anarchy isanupperboundonthepriceofanarchyeveninmixedstrategies,forsomegamesourresultsyieldascorollaries previouslyunknownboundsonthepriceofanarchyinmixedstrategies. 1 Introduction Computersystemsincreasinglyinvolvetheinteractionofmultipleself-interestedagents. Thedesignersofthesesys- temshaveobjectivestheywishtooptimize,butbyallowingselfishagentstointeractinthesystem,theylosetheability todirectlycontrolbehavior. Howmuchislostbythislackofcentralizedcontrol? Muchasthestudyofapproximation algorithms aims to understand what is lost when computation is limited, and the field of online algorithms aims to understandwhatislostwheninformationislimited,thestudyofthepriceofanarchyhasaimedtounderstandwhatis lostwhencentralorganizationislimited. In order tostudy the costincurred when coordination islost, we mustmake some assumption about how selfish agentsbehave. Traditionally,theassumptionhasbeenthatselfishagentswillplayNashequilibriumstrategies,andthe priceofanarchyofagameisdefinedtobetheratioofthevalueoftheobjectivefunctionintheworstNashequilibrium tothesocialoptimumvalue. Itdoesnotseemrealistic,however,toassumethatallagentsinasystemwillnecessarilyplaystrategiesthatform a Nash equilibrium. Even with centralized control, Nash equilibria can be computationally hard to find. Moreover, even when Nash equilibria are easy to find computationally, there is no reason in general to believe that distributed ∗SupportedinpartbytheNationalScienceFoundationundergrantCCF-0514922. †WorkcompletedwhiletheauthorwasapostdoctoralfellowatCarnegieMellon. ‡SupportedinpartbyanAT&TLabsGraduateResearchFellowshipandanNSFGraduateResearchFellowship. 1 self-interestedagents, oftenwithlimitedinformationabouttheoverallstateofthesystem, willnecessarilyconverge tothem. Inaddition,forgameswithonlymixed-strategyequilibria,wewouldhavetoassumethatrationalagentsnot onlyplaysoastomaximizetheirownutility,butalsosoastopreservethestabilityofthesystem. Sinceagamemay have many Nash equilibria, and agents may individually prefer different equilibria, itis not clear why agents would wanttopreservethestabilityofaNashequilibrium,eveniftheymanagedtoreachone. Inthispaper, westudythevalueobtainedingameswithselfishagentswhenwemakeamuchweakerandmore realisticassumptionabouttheirbehavior.Weconsiderrepeatedplayofthegameandallowagentstoplayanysequence of actions with only the assumption that this action sequence has low regret with respect to the best fixed action in hindsight. This “price of total anarchy” is strictly a generalization of price of anarchy, since in a Nash equilibrium, all players have zero regret. Regret minimization is a realistic assumption because there exist a number of efficient algorithmsforplayinggamesthatguaranteeregretthattendstozero,becauseitrequiresonlylocalizedinformation, andbecauseinagamewithmanyplayersinwhichtheactionsofanysingleplayerdonotgreatlyaffectthedecisions ofotherplayers(asisoftenstudiedinthenetworksetting),playerscanonlyimprovetheirsituationbyswitchingfrom astrategywithhighregrettoastrategywithlowregret. Weconsiderfourclassesofgames: Hotellinggames,inwhichplayerscompetewitheachotherformarketshare, valid games [30] (a broad class of games that includes among others facility location, market sharing [14], traffic routing, and multiple-item auctions), linear congestion games with atomic players and unsplittable flow [1] [7], and parallellinkcongestiongames[24]. Weprovethatinthefirstthreecases,thepriceoftotalanarchymatchestheprice ofanarchyexactlyeveniftheplayitselfisnotapproachingequilibrium;forparallellinkcongestionwegetanexact matchforn = 2linksbutanexponentiallygreaterpriceforgeneralnwhenthesocialcostfunctionisthemakespan. Whenweconsideraverageloadinstead,weprovethatifthemachinespeedsarerelativelybounded,thatthepriceof totalanarchyis1+o(1),matchingthepriceofanarchy. Forlinearcongestiongamesandaveragecostloadbalancing, thepriceofanarchyboundswerepreviouslyonlyknownforpurestrategyNashequilibria, andasacorollaryofour priceoftotalanarchybounds,weprovethecorrespondingpriceofanarchyboundformixedNashequilibriaaswell. Mostofourresultsfurtherextendtothecaseinwhichonlysomeoftheagentsareactingtominimizeregretand othersareactinginanarbitrary(possiblyadversarial)manner. Whenstudyinganarchy,itisvitaltoconsiderplayers whobehaveunpredictably, andyetthishasbeenlargelyignoredupuntilnow. SinceNashequilibriaarestableonly ifallplayersareparticipating,andsinkequilibria[15]aredefinedoverstategraphsthatassumethatallplayersplay rationally, suchguaranteesarenotpossibleunderthestandardpriceofanarchymodelorthepriceofsinkingmodel [15].1 1.1 Regretminimizationandthepriceoftotalanarchy Theregretofasequenceofactionsinarepeatedgameisdefinedasthedifferencebetweentheaveragecostincurred bythoseactionsandtheaverage costthebestfixedsolutionwouldhaveincurred, wherethebestischosenwiththe benefit of hindsight. An algorithm is called regret-minimizing, or no-regret, if the expected regret it incurs goes to zeroasafunctionoftime. Regret-minimizingalgorithmshavebeenknownsincethe1950’s,whenHannan[16]gavesuchanalgorithmfor repeated two-player games. Recent work on regret minimization has focused on algorithmic efficiency and conver- genceratesasafunctionofthenumberofactionsavailable,andhasbroadenedthesetofsituationsinwhichno-regret algorithms are known. Kalai and Vempala [20] show that Hannan’s algorithm can be used to solve online linear optimization problems with regret approaching 0 at a rate O(1/√T), given access to an exact best-response oracle. Zinkevich [31] developed a regret-minimizing algorithm for online convex optimization problems. So-called bandit algorithms have also been developed [2, 26, 10, 22], which achieve low regret even in the situation where the algo- rithmreceivesverylimitedinformationaftereachroundofplay. Kakadeetal.[19]showhowtouseanα-approximate best-responseoracletoachieveonlineperformanceinlinearoptimizationproblemsthatisclosetoαtimesthatofthe beststaticsolution. Thoseresultsprovideefficientalgorithmsformanysituationsinwhichthenumberofstrategies for each player is exponential in the size of the natural representation of the game. In cases where each player has onlyapolynomialnumberofstrategies,LittlestoneandWarmuth’sweightedmajorityalgorithm[25]canbeusedto 1Babaioffetal.[3]proposeamodelofnetworkcongestionwith“malicious”players. Theirmodeldefinesmaliciousbehaviorasoptimizinga specificfunction,however,andisnotequivalenttoarbitraryplay. 2 minimizeregret. Inthispaper,weproposeregret-minimizationasareasonabledefinitionofself-interestedbehaviorandstudythe outcome of such behavior in a variety of classes of repeated games. We introduce the term “price of total anarchy” to describe the ratio between the optimum social cost and the social welfare achieved in a game where the players minimizeregret. Wenotethattheguaranteesweproveusingtheno-regretpropertyarestrictlystrongerthanminimax guarantees. 1.2 Relatedwork: Priceofanarchy Economistshavelongstudiedgameswithself-interestedplayers. ANashequilibriuminsuchagameisaprofileof strategies for each player such that, given the strategies of the other players, no player prefers to deviate from her strategy in the profile. A Nash equilibrium can be pure or mixed, depending on whether the players all play pure, deterministicstrategies,ortheyrandomizeoverpurestrategiestogiveamixedstrategy. The study of the effect of selfishness in games has been quite popular in computer science for much of the past decade.In1999,KoutsoupiasandPapadimitriou[24]introducedthenotionofthepriceofanarchyasameasureofthis effect:theystudiedtheratiobetweenthesocialwelfareoftheoptimumsolutionandthatoftheworstNashequilibrium. Sincethen,researchershavestudiedthepriceofanarchyinawidevarietyofgames(forexample[28,30,11]). Unfortunately, Nash equilibria are not necessarily the best definition of selfish behavior. In 2-player, n-action games,NashequilibriaarePPAD-hardtocompute[4],butinanygamewithapolynomialnumberofactions,onecan runregret-minimizingalgorithms. (Onecanalsodosoefficientlyinmanysettingswithevenanexponentialnumber of actions.) Many games only admit mixed Nash equilibria, and there is no immediate incentive for players to play theirgivenmixedstrategyasopposedtoanyoneofthepurestrategiesinthesupportofthemixedstrategy.Inaddition, thereisnoreasontoassumeingeneralgamesthatagentsdemonstratingselfishbehavior shouldconvergetoaNash equilibrium. OurworkismostsimilarinspirittothatofMirrokniandVetta[27]andGoemansetal.[15], whoalsoquestion the plausibility of selfish agents converging to Nash equilibria. They introduce the notion of sink equilibria, which generalize Nash equilibria in a different way than we do. In doing so, they abandon simultaneous play, and instead consider sequential myopic best response plays. They analyze sink equilibria in the class of valid games and show thatvalidgameshaveapriceofsinkingofbetweennandn+1. Incontrast,weprovethatvalidgameshaveaprice of total anarchy of 2, matching the (Nash) price of anarchy. One reason for this gap is that myopic best responses providenoguaranteeaboutthepayoffofanyindividualplayer. Indeed,theexamplein[15]ofavalidgamewithprice of sinking n demonstrates that myopic best response is not always rational: In their example, myopic best response playerseachexpectaveragepayofftendingtozeroasthenumberofplayersincreases,whereastheycouldeacheasily guaranteethemselvespayoffsofoneoneveryturn(andwoulddosoiftheyminimizedregret). Additionally,because sinkequilibriarelyonplayenteringandneverleavingsinksofabestresponsegraph,thepriceofsinkingisbrittleto Byzantineplayerswhomaynotbeplayingbestresponses. Incontrast,weshowthatvalidgameshaveapriceoftotal anarchyof2eveninthepresenceofarbitrarilymanyByzantineplayers,aboutwhomwemakenoassumptions. 1.3 Relatedwork: Correlatedequilibria FosterandVohra[12]showthatanyalgorithmthatminimizesastrongernotionofregretknownas“internalregret” will result in an empirical distribution of play that converges to a weaker notion of equilibrium, a correlated equi- librium. Inaddition, severalpolynomial-timeinternal-regret-minimizingalgorithmsareknownforsettingsinwhich actionchoicesareexplicitlygiven[17]. Becauseweplaceaweakerassumptionontheagents’algorithms, thereare more algorithms, simpler algorithms, and more efficient algorithms for regret minimization than for internal regret minimization. Inaddition, weareabletoprove guarantees even inByzantine settings, wherenotallplayers behave rationally;suchsettingsneednotcorrespondtocorrelatedequilibria. 1.4 Ourresults Inthispaper, westudythepriceoftotalanarchyinfourclassesofgames. Weemphasizethatouranalysisdoesnot presumethatplayersplayaccordingtoanyparticularclassofalgorithms;ourresultsholdwheneverplayershappento 3 experiencelowregret,whichisastrictlyweakerassumptionthanthatplayersplayaccordingtoaNashequilibrium. than In Section 3 we examine a class of generalized Hotelling games, where sellers select locations on a graph and achieverevenuesthatdependontheirownlocationsaswellasthelocationschosenbytheothersellers. Weprovethat forsuchgames(andanevenbroaderclass,seeSection3.3),anyregretminimizingplayergetsatleasthalfofherfair shareofthesales, regardlessofhowtheother(Byzantine)playersbehave.2 Thisresultexactlymatchesthepriceof anarchyinthesegames. Validgames, introducedbyVetta[30], modelgameswherethesocialutilityissubmodular, theprivateutilityof eachplayerisatleastherVickreyutility(theamountherpresencecontributestotheoverallwelfare),andwherethe sum of the players’ private utilities is at most the total social utility. In Section 4 we prove that the price of total anarchyinvalidgameswithnondecreasingsocialutilityfunctionsexactlymatchesthe(Nash)priceofanarchy,even ifByzantineplayersareaddedtothesystem. Finally, in Section 5, we analyze atomic congestion games with two types of social welfare functions. First, we consider unweighted atomic congestion games with player-summed social welfare functions, and in both the linear costandthepolynomialcostcase,weshowpriceoftotalanarchyresultsthatmatchthepriceofanarchy[7,1]. Next, weconsideraparallellinkcongestiongamewithsocialwelfareequaltomakespan,thegamethatinitiatedthestudyof thepriceofanarchy[24],andshowthatthepriceoftotalanarchyoftheparallellinkcongestiongamewithtwolinks is3/2, exactlymatchingthepriceofanarchy. Wealsoshowthatthepriceoftotalanarchyintheparallellinkgame with n links is Ω(√n), which is strictly worse than the price of anarchy. Finally, we show a price of total anarchy matchingtheknownpriceofanarchyintheloadbalancinggamewiththesumsocialutilityfunction. Inthecaseof loadbalancingwithsumsocialutility,ourpriceoftotalanarchyresultsalsoyieldpreviouslyunknownpriceofanarchy resultsformixedstrategies. InSection6,wediscusstechniquesforminimizingregretineachofthesesettings. 2 Preliminaries Inthispaper,weconsiderk-playergames.Foreachplayeri,wedenoteby thesetofpurestrategiesavailabletothat i A player. Amixedstrategyisaprobabilitydistributionoveractionsin ; wedenoteby thesetofmixedstrategies i i A S availabletoplayeri. Let = ... and = ... . Everygamehasanassociatedsocial 1 2 k 1 2 k utilityfunctionγ : ARthAatta×keAsa×setco×ntAainingaSnactSion×foSre×achpl×aySertosomerealvalue. Eachplayerihas anindividualutilityfAun→ctionα : R. i A→ WeoftenwanttotalkaboutthesocialorindividualutilityofastrategyprofileS = s ,...,s . Tothisend, 1 k wedenotebyγ¯ : Rtheexpectedsocialutilityoverrandomnessoftheplayersand{byα¯ : }∈RStheexpected i S → S → valueoftheutilityofastrategyprofiletoplayeri. Wedenotethesocialvalueofthesociallyoptimumstrategyprofile by OPT = max γ¯(S) in maximization problems. Correspondingly, OPT = min γ¯(S) in minimization S S ∈S ∈S problems. Wealsosometimeswishtotalkaboutamodificationofaparticularstrategyprofile;letS s bethestrategyset ⊕ 0i obtainedifplayerichangesherstrategyfroms tos . Let bethenullstrategyforplayeri(playeritakesnoaction). i 0i ∅i Weusesuperscriptstodenotetime,soStisthestrategyprofileattimet;stisplayeri’sstrategyattimet. i We consider both maximization and minimization games in this paper. In maximization games the goal is to maximizethesocialutilityfunctionandtheplayerswishtomaximizetheirindividualutilityfunctions;inminimization games,bothquantitiesminimized. Wedefinethepriceofanarchyandthepriceoftotalanarchysothattheirvaluesare alwaysgreaterthanorequaltoone,regardlessofwhetherwearediscussingamaximizationoraminimizationgame: OPT Definition2.1. Thepriceofanarchyforaninstanceofamaximizationgameisdefinedtobe , whereS isthe γ¯(S) worstNashequilibriumforthegame(theequilibriumthatmaximizesthepriceofanarchy). Thepriceofanarchyfor aninstanceofaminimizationgameisdefinedtobe γ¯(S) ,whereS istheworstNashequilibriumforthegame(the OPT equilibriumthatmaximizesthepriceofanarchy). 2WenotethatrobustnesstoByzantineplayersisnotinherentinourmodel.Indeed,thereexistgamesforwhichtheadditionofByzantineplayers canmakethesocialwelfare,aswellastheutilitiesofindividualregret-minimizingplayers,arbitrarilybad. 4 Definition2.2. TheregretofplayeriinamaximizationgamegivenactionsetsA1,A2,...,AT is T T 1 1 max α (At a ) α (At). i i i ai i T ⊕ − T ∈A t=1 t=1 X X TheregretofplayeriinaminimizationgamegivenactionsetsA1,A2,...,AT is T T 1 1 α (At) min α (At a ). i i i T −ai i T ⊕ t=1 ∈A t=1 X X Aregret-minimizingalgorithmisonewithlowexpectedregret. Property2.3. Whenaplayeriusesaregret-minimizingalgorithmorachieveslowregret,foranysequenceA1,...,AT, sheachievestheproperty T T 1 1 max α (At a ) R(T)+E α (At) i i i ai i T ⊕ ≤ "T # ∈A t=1 t=1 X X formaximizationgamesand T T 1 1 E α (At) R(T)+ min α (At a ) i i i "T #≤ ai i T ⊕ t=1 ∈A t=1 X X forminimizationgames,whereexpectationisovertheinternalrandomnessofthealgorithm,andwhereR(T) 0as → T . ThefunctionR(T)maydependonthesizeofthegameoracompactrepresentationthereof. Wethendefine →∞ T tobethenumberoftimestepsrequiredtogetR(T)=². ² Notethatthisimpliesthat,foranysequenceS1,...,ST,aplayerwiththeregret-minimizingpropertyachieves T T 1 1 max α¯ (St a ) R(T)+ α¯ (St) i i i ai i T ⊕ ≤ T ∈A t=1 t=1 X X formaximizationgamesand T T 1 1 α¯ (St) R(T)+ min α¯ (St a ) i i i T ≤ ai i T ⊕ t=1 ∈A t=1 X X forminimizationgames. OPT Definition2.4. Thepriceoftotalanarchyforaninstanceofamaximizationgameisdefinedtobemax , T1 PTt=1γ¯(St) where the max is taken over all T and S1,S2,...,ST, where S ,...,S are play profiles of players with the 1 T regret-minimizing property. The price of total anarchy for an instance of a minimization game is defined to be max T1 POTt=P1Tγ¯(St), where the max is taken over all T and S1,S2,...,ST, where S1,...,ST are play profiles of playerswiththeregret-minimizingproperty. BecauseallplayershavezeroregretwhenplayingaNashequilibrium,thepriceoftotalanarchyofagameisnever less than its price of anarchy. In this paper we study the price of anarchy and the price of total anarchy for general classes of games. The price of (total) anarchy for a class of games is defined to be the maximum price of (total) anarchyoveranyinstanceinthatclass. Boundsonthepriceof(total)anarchyforaclassofgamesmaynotbetight forparticularinstancesinthatclass. 3 Hotelling games Hotellinggames[18]arewellstudiedintheeconomicsliterature;see,forexample,[13]and[21]forsurveys.Hotelling gamesaretraditionallylocationgamesplayedonaline,butwegeneralizethemtoanarbitrarygraphandabroadclass ofbehaviorsonthepartofthecustomers. WeproveourresultfirstforaspecificHotellinggame,andthenobservethat ourproofstillholdsinamuchmoregeneralsetting. 5 3.1 Definitionandpriceofanarchy ImagineasetofsouvenirstandownersinPariswhomustdecidewheretosetuptheirsouvenirstandseachday. Every day,ntouristsbuyasouvenirfromwhicheverstandtheyfindfirst. Eachstandoperatorwishestomaximizeherown sales. Every day there are n sales, and we wish to maximize fairness: The social welfare function is the minimum sales of any souvenir stand. Formally, this maximization game is defined by an n vertex graph G = (V,E). Every seller i among the k sellers has strategy set = V, that is, every day she sets up her stand on some vertex of the i A graph. Eachday,everytouristchoosesapathfromsomeprivatedistributionoverpathsonthegraph,andbuysfrom the seller he encounters first (for instance, as a special case, we could have one tourist at each vertex of the graph who purchases from the nearest souvenir stand). If there is a tie between sellers, we assume the tourist splits his contribution among them equally). At any time t the social welfare is γ¯(St) = min α¯ (St). The social optimum i i isobtainedbysplittingallverticesequallyamongallk players(thiscanbeachievedifallplayersplayonthesame vertex). ThereforeOPT =n/k. Theorem3.1. ThepriceofanarchyoftheHotellinggameis(2k 2)/k. − Proof. GivenastrategysetS,considerthealternateset(S ). Therearek 1activeplayersinthisalternateset i ⊕∅ − andthetotalpayoffisstilln,sotheremustbesomeplayerhwhoachievesexpectedpayoffα¯ (S ) n/(k 1). h i Ifplayeriplayedthesamestrategyasplayerh,shewouldachieveexpectedpayoffα¯ (S s ) ⊕∅n≥. Thus−,any i ⊕ h ≥ (2k 2) strategy achieving expected payoff less than n is not an equilibrium strategy, since in a Nash e−quilibrium, no (2k 2) playerwishestochangeherstrategy. − Thisboundistight:Consideragameonagraphwithk 1identicalstars,whereweidentifytouristswithvertices − ofthegraphandeachpatronizesthenearestsouvenirstand.Inthisexample,k 1oftheplayersplaydeterministically − atthecenteroftheirownstar;playerkplaysuniformlyatrandomoverallk 1starcenters. ThisstrategysetS isa − Nashequilibrium,andtherandomizingplayerearnsα¯ (S) = n/(2k 2)(theotherplayersdobetter),sothesocial k − welfareγ(S)=n/(2k 2). SinceOPT =n/k,thisdemonstratesthatthepriceofanarchyis OPT = (2k−2). − γ¯(S) k 3.2 Priceoftotalanarchy SinceataNashequilibrium,noplayerhasregret,thepriceoftotalanarchyfortheHotellinggameisatleast(2k 2)/k. − Inthissection,weshowthatthisvalueistight;thatis: Theorem3.2. ThepriceoftotalanarchyintheHotellinggameis(2k 2)/k,matchingthepriceofanarchy. − The proof of this theorem relies on the symmetry of the game; this property was similarly useful to Chien and Sinclair[5]inthecontextofstudyingconvergencetoNashequilibriainsymmetriccongestiongames. LetOtbethesetofplaysattimetbyallplayersotherthanplayeri.LetO = T Ot,theunionwithmultiplicity i i t=1 i ofallplaysofplayersotherthanioveralltimeperiods. P Definition3.3. Let∆t u bethequantitysuchthatifplayeriplaysanactionuniformlyatrandomfromOt attime i→ i stepu,sheachievesexpectedpayoffn/(2k 2)+∆t u. Notethat∆t tisalways0becausethek 1otherplayers − i→ i→ − haveaveragepayoffexactlyn/(k 1)whenplayeriisremoved. − Lemma3.4. Foralli,forall1 t,u T: ∆u t+∆t u 0. ≤ ≤ i→ i→ ≥ Proof. If t = u, the claim follows easily, as noted in the definition. Otherwise, imagine a (2k 2)-player game in − whichthereisatime-tplayerandatime-uplayerforeachoriginalplayerotherthani. Thetime-tversionofaplayer j playsstrategyst;thetimeuversionplayssu. Sincethesumofallplayers’payoffsisn,ifplayeripicksarandom j j strategyfromamongthosealreadybeingplayedandplaysitinthisimaginarygamereplacingtheplayershecopies, i expects to have payoff n/(2k 2). Half of the time, player i will select a time-t strategy and replace that time-t − player. Itcanonlyimprovei’spayoffinthiscasetoremovealloftheothertime-tplayersandonlyplayagainsttime-u players. This leaves i playing a strategy uniformly selected from Ot at time u. A parallel argument holds the other i 6 halfofthetime,whenplayeriselectsatime-ustrategy,andthus n 1 n 1 n +∆t u + +∆u t (2k 2) ≤ 2 (2k 2) i→ 2 (2k 2) i→ − µ − ¶ µ − ¶ n 1 = + (∆t u+∆u t) (2k 2) 2 i→ i→ − asdesired. ofTheorem3.2. FixasequenceofplaysS1,...,ST. RecallthatO = O1+...+OT. Defineot tobetheuniform i i i i distributionoverOt. PickinganactionauniformlyatrandomfromO isequivalenttopickingarandomtimestepu i i and then picking a strategy a Ou uniformly at random. Player i’s expected payoff had she randomly selected ou ∈ i i andplayeditoverallT roundsis T T T T 1 1 n α¯ (St ou) = +∆u t T i ⊕ i T (2k 2) i→ u=1t=1 u=1t=1µ − ¶ XX XX T T Tn 1 = + ∆u t (2k 2) T i→ − u=1t=1 XX Tn , ≥ (2k 2) − where the last inequality holds because of Lemma 3.4. Therefore, there must be some single fixed action a S ∗ thatachievesatleast Tn whenplayedoverT roundsoftheabovegame. Anyregretminimizingplayerachi∈eves (2k 2) expectedtotalpayoffatl−eastthismuch(minus²),andsohasexpectedpayoffatleastn/((2k 2)) ²,provingthe − − theorem. 3.3 ThepriceoftotalanarchyingeneralizedHotellinggames We note that the proof of Theorem 3.2 made no use of the specifics of the Hotelling game described above. In particular,thesameproofshowsthatanyregretminimizingplayerachievesexpectedpayoffapproachingn/(2k 2) − regardlessofhowotherplayersbehave,andsoweareabletoguaranteegoodpayoffamongregret-minimizingplayers playerseveninthepresenceofByzantineplayersmakingarbitrary(oradversarial)decisions. Theorem 3.5. Any player who minimizes regret in the Hotelling game achieves payoff approaching n/(2k 2), − regardlessofhowtheotherplayersplay. Thesameproofalsoholdswhenthebuyersusemuchmoregeneralrulesforchoosingwhichstandtopatronize.3 Neitherdoweusethefactthatplayers’utilitiesarelinear. Infact,ourproofonlymakesuseofthreepropertiesofthe Hotellinggame: 1. ConstantSum: Theindividualutilitiesoftheplayersinthegamealwayssumtothesamevalue,regardlessof play. 2. Symmetric: Allplayershavethesameactionset,andthepayoffvectorisafunctionoftheactionvectorthatis invarianttoapermutationofthenamesoftheplayers. 3. Monotone:Thegameisdefinedforanynumberofplayers,andremovingplayersfromthegame(whilekeeping thestrategiesoftheremainingplayersfixed)doesnotdecreasethepayoffforanyremainingplayer. We call such games with the “fairness” social utility function γ¯(S) = min α (S) generalized Hotelling games i i andgetthefollowingtheorem: 3Onecaveatisthatcustomersmaynotingeneralbasetheirselectionrulesontheactionsoftheplayers—forinstancebypatronizingthesecond closestsouvenirstand.Ifweweretoallowrulessuchasthis,removingplayersfromthegamecoulddecreasethepayoffofsomeoftheremaining players,andwerelyonthisnotbeingthecase. 7 Theorem3.6. Inanyk-player,generalizedHotellinggame,thepriceoftotalanarchyamongregretminimizingplayers is(2k 2)/keveninthepresenceofarbitrarilymanyByzantineplayers. − 3.4 Regretminimizationneednotconverge SinceplayersmayefficientlyminimizeregretinHotellinggames, butmaynotnecessarilybeabletocomputeNash equilibria, it is notable that we are able to match standard price-of-anarchy guarantees. In fact, it is possible that regret-minimizingplayersinHotellinggamesneverconvergetoaNashequilibrium: Theorem3.7. EvenifallplayersintheHotellinggameareregretminimizing,stagegameplayneednotconvergeto Nashequilibrium. Proof. Consider k players 0,...,k 1 on a graph with k 1 identical n-vertex stars with centers v ,...,v 0 k 2 { − } − − andanisolatedvertexv . Attimeperiodt,playeriplaysonvertexv . Eachplayerhasexpectedpayoff k 1 t+i modk (n(k 1)+1)/k,butno−fixedvertexhasexpectedpayoffmorethan(n(k+1)/2k),sonoplayerhaspositiveregret. − However, at each time period, the player at the isolated vertex v has incentive to deviate, so this is not a Nash k 1 − equilibrium. Asimilarexampleshowsthatevenifallplayersminimizeinternalregret(sothatplayisguaranteedtoconvergeto thesetofcorrelatedequilibria),playcancycleforeverandsoneednotconvergetoNashequilibrium.4 4 Valid games 4.1 Definitionsandpriceofanarchy Valid games, introduced by Vetta [30], are a broad class of games that includes the market sharing game studied by Goemans et al. [14], the facility location problem, a version of the traffic routing problem of Roughgarden and Tardos [28], and multiple-item auctions [30]. When describing valid games, we slightly adapt the notation of [30]. Considerak-playermaximizationgame,whereeachplayerihasagroundsetofactions fromwhichshecanplay i V some subset. Not every subset of actions is necessarily allowed. Let = ... , and let = a 1 k i i : a isafeasibleaction . Let the game have some social utility funcVtion γV: ×2 ×R,Vand let eacAh playe{r hav⊆e i i V Va private utility function α} : 2 R. The discrete derivative of f at X V in→the direction D V X is i V → ⊆ ⊆ − f (X)=f(X D) f(X). D0 ∪ − Definition4.1. Asetfunctionf :2 RissubmodularifforA B,f (A) f (B) i B. V → ⊆ i0 ≥ i0 ∀ ∈V − Note that submodular utility functions represent the economic concept of decreasing marginal utility, reflecting economiesofscale. Definition4.2. Agamewithprivateutilityfunctionsα :2 Randsocialutilityfunctionγ :2 Risvalidifγ i V V → → issubmodularand α¯ (S) γ¯ (S ) (1) i ≥ s0i ⊕∅i k α¯ (S) γ¯(S) (2) i ≤ i=1 X Condition 1 states that each agent’s payoff is at least her Vickrey utility—the change in social utility that would occurifagentididnotparticipateinthegame. Condition2statesthatthesocialutilityofthegameisatleastthesum oftheagents’privateutilities. 4kplayersplayonasetofk/2+1vertices.Playersaredividedintotwoequalsizedgroups,LandR.Everyturn,thereisexactlyoneplayeron k/2vertices,andk/2playersontheremainingvertex.PlayersinLandRgettheirownverticesonalternateturns,andthecrowdedvertexrotates, sothateachplayerisequallyoftenoneveryvertex,andonanyparticularvertexsheisequallyoftenaloneandcrowded. Thereforenoplayerhas anyincentivetoswapanyvertexwithanyother. 8 Forexample,considerthemarketsharinggamestudiedbyGoemansetal.[14]. Thegameisplayedonabipartite graphG = ((V,U),E). EachvertexinV isaplayer,andeachvertexinU isamarket. Eachmarkethasavalueand acosttoserviceit,andeachplayerhasabudget. Aplayermayenterasetofmarketstowhichshehasedges,ifthe sumoftheircostsisatmostherbudget. Foreachmarketthataplayerenters, shereceivespayoffequaltothevalue of that market divided by the number of players that chose to enter it. The social utility function is the sum of the individualplayerutilities,orequivalently,thesumofthevaluesofthemarketsthathavebeenenteredbyanyplayer. Thisvalidgamemodelsasituationinwhichcableinternetprovidersenterdifferentcitieswithvaluesproportionalto their populations and share the market equally with other local providers; the social utility is the number of people withaccesstohighspeedinternet. Vetta [30] analyzes the price of anarchy of valid games and shows that if S is a Nash equilibrium strategy and Ω= σ ,...,σ isastrategyprofileoptimizingthesocialutilityfunctionsothatγ¯(Ω)=OPT,then 1 k { } OPT 2γ¯(S) γ¯ (S ) ≤ − s0i ⊕∅i i:σXi=si γ¯ (Ω (S ... )). − s0i ∪ ⊕∅i⊕ ⊕∅k i:σXi6=si Thus,ifγ isnondecreasing,thenforanyNashequilibriumstrategyS,γ(S) OPT/2,givingapriceofanarchyof ≥ 2. Incontrast,Goemansetal.[15]showthatthepriceofsinkingforvalidgamesislargerthann. 4.2 Priceoftotalanarchy Inthissection,weshowthatthepriceoftotalanarchyforvalidgamesmatchesthepriceofanarchyexactly: Theorem4.3. Ifallplayersplayregret-minimizingstrategiesforT rounds,withstrategyprofileSiattimei,then T 1 OPT 2γ¯(St) γ¯ (St ) ≤ T − s0ti ⊕∅i Xi=1µ i:σXi=sti γ¯ (Ω (St ... )) +²k. − s0ti ∪ ⊕∅i⊕ ⊕∅k i:σXi6=sti ¶ Wedeferthisprooftotheappendix. Fornondecreasingγ,wegetthefollowingcorollary: Corollary4.4. Ifγ isnondecreasing,thepriceoftotalanarchyforvalidgamesisasymptotically2. ThepriceofanarchyandthepriceofsinkingarebothbrittletotheadditionofByzantineplayers. Incontrast,for nondecreasing social welfare functions γ, our price of total anarchy result holds even in the presence of arbitrarily manyByzantineplayers. Inanyvalidgame,supposeplayers1,...,k areregretminimizing. LetOPT = γ(Ω)be the optimal value for these players playing alone. Suppose there is some additional set of Byzantine players that B behavearbitrarily. Theorem 4.5. Consider a valid game with nondecreasing social welfare function γ, where the k regret minimizing players play S1,...,ST over T time steps while the Byzantine players play B1,...,BT. Then the average social welfare1/T T γ(St Bt) OPT/2. t=1 ∪ ≥ Proof. WeobPservethat γ(Ω Bt) ∪ γ(Ω St Bt) ≤ ∪ ∪ =γ(St Bt)+ γ (St Bt (Ω ... )) ∪ σ0i ∪ ∪ ⊕∅i⊕ ⊕∅k i:σXi6=sti γ(St Bt)+ γ (St Bt), ≤ ∪ σ0i ⊕∅i∪ i:σXi6=sti 9 wherethefirstinequalityfollowsbecauseγisnondecreasing,andthethirdfollowsfromsubmodularity. Wethenhave OPT γ(Ω Bt) ≤ ∪ γ(St Bt)+ γ (St Bt) ≤ ∪ σ0i ⊕∅i∪ i:sXi6=σi γ(St Bt)+ α (St σ Bt) i i ≤ ∪ ⊕ ∪ i:sXi6=σi withthefirstlinefollowingbecauseγisnondecreasing,andthesecondfromtheVickreycondition. SummingoverT, thisyields T T T OPT γ(St Bt)+ α (St σ Bt). i i · ≤ ∪ ⊕ ∪ Xt=1 Xt=1i:sXi6=σi Suppose T γ(St Bt)<T OPT/2. Since t=1 ∪ · P T k T k+ T |B| α (St Bt) α (St Bt) γ(St Bt), i i ∪ ≤ ∪ ≤ ∪ t=1 i=1 t=1 i=1 t=1 XX X X X itmustbethat k T k T α (St σ Bt)> α (St Bt), i i i ⊕ ∪ ∪ i=1 t=1 i=1 t=1 XX XX andsothereissomeregretminimizingplayeriforwhom T α (St σ Bt)> T α (St Bt),violatingthe t=1 i ⊕ i∪ t=1 i ∪ conditionthatheisregretminimizing. P P Note that here we have shown that in a valid game with a nondecreasing social utility function, if k players minimize regret and an arbitrary number of Byzantine players are added to the system, the resulting social welfare is no worse than half the optimal social welfare for k players. This is a slightly different result than we showed for Hotellinggames,wherewewereabletoguaranteethateachregret-minimizingplayerobtainsatleasthalfofherfair shareoftheentiregame,regardlessofwhattheotherk 1playersdo. Ontheotherhand,forvalidgamesoneclearly − cannotobtainhalfoftheoptimumsocialwelfarefork+ playerssincetheByzantineplayersneednotbeactingin |B| eventheirowninterest. 5 Atomic Congestion Games Inthissection,weshowpriceoftotalanarchyresultsmatchingexistingpriceofanarchyresultsforatomic,unweighted congestion games with social utility equal to the sum of the player utilities [7, 1]. We also consider the atomic congestiongameofweightedloadbalancingwithsocialutilityequaltothemakespan[24,8,23],andshowmatching resultsfortwolinks,butdemonstratethatfornlinks,thepriceoftotalanarchyisexponentiallyworsethanthepriceof anarchy. Finally,weconsiderweightedloadbalancingwithsocialutilityequaltothesumoftheplayerutilities[29], andshowthatfork >>n,thepriceoftotalanarchyis1+o(1). Inthecaseofloadbalancingwithsumsocialutility, ourpriceoftotalanarchyresultsalsoimplypreviouslyunknownpriceofanarchyresultsformixedstrategies. A congestion game is a minimization game consisting of a set of k players and, for each player i, a set of i V facilities. Player i plays subsets of facilities from some feasible set = a : a isafeasibleaction . In i i i i A { ⊆ V } weighted games, each player i has an associated weight w ; in unweighted games, each player weight is 1. Each i facilityehasanassociatedlatencyfunctionf . Aplayeriplayinga experiencescostα = f (l )wherel is e i i e ai e e e theloadonfacilitye: l = j :e a w . ∈ e i j ∈ P P 10

Description:
price of anarchy of a game is defined to be the ratio of the value of the if all players are participating, and sink equilibria [15] are defined over state
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.