ebook img

Reciprocity Evolution and Decision Games in Network and Data Science. PDF

475 Pages·2021·15.606 MB·English
by  coll.
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Reciprocity Evolution and Decision Games in Network and Data Science.

1 Basic Game Theory Game theory is the study of mathematical modeling of strategic interactions among agents. Agents, or players in the game, are usually considered rational (i.e., seeking theirownmaximumbenefitfromthegame’soutcome).Theoutcomesofthegame,or theconsequencesoftheactionsofagents,dependontheir(strategic)interactionsand (nonstrategic)environmentalfactors.Thus,arationalplayershouldobserve,analyze, andpredicttheactionsofotherplayersinthegameinordertoselecttheappropriate actionthatleadstothemostdesiredoutcome.Ingametheory,themaingoalistopre- dicttheoutcomeofthegamegiventherationalityofplayersanddefinedenvironments. Agameiscomposedofatleastthefollowingelements:players,actions,outcomes, andutilityfunctions.Players,whicharedenotedasN={1,2,3,...,N},aretheagents who would rationally maximize their utilities in the game. The way for Player i to maximizethecorrespondingutilityistochoosetheactiona fromtheactionsetA . i i An action profile a = {a ,a ,...,a } represents the actions that each player may 1 2 N selectinthegame.Whentheactionprofileaisdetermined,theoutcomeofthegame can then be derived by the outcome function O(a). Given the outcome, each player may receive a utility according to the utility function U (O(a)). In sum, we may i describeagameasatupleG={N,A,U,O}. 1.1 Strategic-FormGamesandNashEquilibrium Thestrategic-formgameisoneofthemostbasicgamestructures,wheretherelation between outcome and action can be represented in a matrix form. A well-known strategic-form game is the prisoner’s dilemma. Two players are questioned by the prosecutorregardingtheevidenceofacrime.Theymaychoosetostaysilentorbetray theotherforashorterprisonsentence.Theprisoner’sdilemmacanbeexplainedwith thepayoffmatrixinTable1.1.Inthematrix,eachcellrepresentstheutilityreceived byPlayersAandB,respectively,iftheychoosetheactionprofile. In a strategic-form game, we seek the Nash equilibrium, which represents the expectedactionprofile(i.e.,theoutcomeofthegame)iftheplayersarerational. 2 BasicGameTheory Table1.1.Prisoner’sdilemma:astrategic-formgame (AB) StaySilent Betray StaySilent (−1,−1) (−8,0) Betray (0,−8) (−5,−5) definition 1.1 (Nash equilibrium) Nash equilibrium is the action profile a∗ = {a∗,a∗,...,a∗}∈A that 1 2 N U (O(a∗,a∗ ))≥U (O(a,a∗ )),∀i ∈N,a ∈A, i i −i i i −i i i where ai is the action of Player i and a−i is the action profile of all players except Playeri. The concept of the Nash equilibrium states that every player is satisfied with the selectedactionintheprofilegiventheactionsselectedbyotherplayersintheprofile. In other words, no player can receive higher utility by changing their own action. Therefore, rational players have no incentive to change their actions when the game falls into Nash equilibria. Notice that the Nash equilibrium or pure-strategy Nash equilibriumwedefinedinDefinition1.1maynotalwaysbeuniqueorevenmaynot exist.ItisnecessarytoanalyzetheexistenceanduniquenessofNashequilibriainthe definedgamemodel. Intheprisoner’sdilemmagameillustratedinTable1.1,readersmayobservethat, given any action selected by one player, the other player will have higher utility if they choose to betray. Therefore, {Betray,Betray} is the unique Nash equilibrium. Given that a Nash equilibrium exists and is unique in the prisoner’s dilemma, we could predict that the final outcome of the game, if all players are rational, will be {Betray,Betray}. One may notice that the best choice for the players, if they cooperate, would be the {StaySilent,StaySilent}, for the utility of (−1, −1). Nevertheless, this better outcomeisimpossibleintheprisoner’sdilemma,sincebothplayershavetheincentive tobetrayforashorterprisonsentence.Thisrationalchoiceeventuallyleadsthemtoa worseoutcome,whichexplainswhythisisa“dilemma.”Thisexamplesuggeststhat rationaldecisionsinthegamemayleadtosuboptimaloutcomesfromtheperspective ofoverallsystemefficiency. 1.2 Extensive-FormGamesandSubgame-PerfectNashEquilibrium Inastrategic-formgame,playersdonothaveknowledgeoftheactionsselectedbythe otherplayers.Suchgamesaresuitableforproblemsinvolvingsimultaneousdecision- making or if the decisions are privately made. For scenarios in which the actions of players can be observed, either fully or partially, by other players, extensive-form gameswouldbemoresuitable. 1.2 Extensive-FormGamesandSubgame-PerfectNashEquilibrium 3 Figure1.1 Ultimatumgame Anextensive-formgamecanberepresentedbyagametree,inwhichtheterminal nodes are the outcomes of the game with payoffs to the players, and the rest of the nodes are the decision timings when the players (and/or the nature) may select the actions(and/orexternalinfluence)todirectthegametowardcertainoutcomes. Theultimatumgame,whichisillustratedinFigure1.1,isafamousexampleofan extensive-formgame.Intheultimatumgame,Player1isrequestedtoproposeanoffer forsharingacake,whilePlayer2canchoosetoacceptorrejecttheoffer.IfPlayer2 acceptstheoffer,thecakewillbeallocatedasis.Nevertheless,ifPlayer2choosesto rejecttheoffer,neitherplayerwillreceivethecake. Inthisgame,wemayidentifythreeNashequilibriaaccordingtoDefinition1.1: (1) Player1offerstofairlysharethecakeandPlayer2onlyacceptswhentheoffer isfair,or{Fair,Fair|Accept,Unfair|Reject}. (2) Player 1 offers to unfairly share the cake and Player 2 only accepts when the offerisunfair,or{Unfair,Fair|Reject,Unfair|Accept}. (3) Player 1 offers to unfairly share the cake and Player 2 accepts any offer, or {Unfair,Fair|Accept,Unfair|Accept}. Thefirstoneistheofferthatwesometimesseewhenpeoplearefacingthesituation similartoPlayer2;thatis,theywillmakeanultimatumorthreatthattheywilldamage bothofthemiftheyaretreatedunfairly.Thisiswhythegameiscalledtheultimatum game. Nevertheless, although such a claim can be supported by a Nash equilibrium, itmaynotbeasuitablesolutionconceptfortheextensive-formgame,asitdoesnot considerthefactthatPlayer2alreadyknowswhichactionPlayer1hasselectedatthe timeoftheirdecision. The subgame-perfect Nash equilibrium is a refined solution concept for the extensive-form game. A subgame is a subtree of the game tree starting from any nodeandincludingallbranchesfollowingthestartingnode.Asubgame-perfectNash equilibriumisdefinedasfollows: definition 1.2 (Subgame-perfect Nash equilibrium) A Nash equilibrium is a subgame-perfectNashequilibriumifitisaNashequilibriumineverysubgame. 4 BasicGameTheory The concept of the subgame-perfect Nash equilibrium captures the rationality of players when they know the actions of other players beforehand. In the ultimatum game,forinstance,Player2alreadyknowswhetherPlayer1hasofferedafairshare when they choose to accept or not. In both subgames, after Player 1 proposes the offer, the only Nash equilibrium in the subgames would be Player 2 choosing to accept the offer, since rejecting will give Player 2 zero utility. With this refinement, the only subgame-perfect Nash equilibrium would be {Unfair,Fair|Accept, Unfair|Accept}. 1.3 IncompleteInformation:SignalandBayesianEquilibrium In some problems, the outcome of the game involves uncertainty. The uncertainty maycomefromtheexternalfactorsthatcouldnotbeobserveddirectlyortheprivate preferencesandactionsoftheplayers.Theexactoutcomeofthegamethereforemay notbeknownatthetimeofthedecision-making.Insuchagame,rationalplayersmust estimatetheinfluenceofthisuncertainty.Insteadofseekingutilitymaximization,they aimtomaximizetheirexpectedutility. LetusconsideragamewithplayersetNandactionsetA.Thegameisinastate θ ∈ (cid:2),which isunknown tosomeor all players. The utilityof Player i isgiven by U (a,θ),a∈C,whichdependsontheactionselectedbyPlayeri,theactionsselected i bytheotherplayers,andthestateθ. Theuncertaintyinthestatecanbeestimatedifthedistributionofthestateisknown. Itcaneitherbeknowninadvance(signalinggame)orlearnedaboutfromthereceived information (social learning). Let us assume that the probability of the state p (or i belief), on state θ over state space (cid:2) is known or derived after strategic thinking. Playersthenmaymaximizetheirexpectedutilitiesbasedonthebeliefasfollows: (cid:2) pi(Ii)={pi,θ|θ ∈(cid:2)}, pi,θ(Ii)=1, θ∈(cid:2) whereI istheinformationreceivedbyPlayeri inthegame. i Prob(I |θ) pi,θ(Ii)= (cid:3)θ(cid:6)∈(cid:2)Proib(Ii|θ(cid:6)) Giventhisnewobjectiveoftheplayers,wemayextendtheequilibriumconcepttoa BayesianNashequilibrium. definition1.3(BayesianNashequilibrium) ABayesianNashequilibriumisthe ∗ actionprofilea ,where (cid:2) (cid:2) pi,θ(Ii)Ui(ai∗,a∗−i,θ)≥ pi,θ(Ii)Ui(ai,a∗−i,θ),∀i ∈N,ai ∈Ai. θ∈(cid:2) θ∈(cid:2) Thereputationgameisafamousexampleofagamewithincompleteinformation. In the game we have two firms. Firm 1 is in the market and prefers a monopoly. Firm 2 is new and would like to enter the market. There are two possible types of 1.4 RepeatedGamesandStochasticGames 5 Table1.2.Reputationgame Stay Exit Sane/Prey (2,5) (X,0) Sane/Accommodate (5,5) (10,0) Crazy/Prey (0,−10) (0,0) Firm 1: Sane and Crazy, each with 0.5 probability. Their actions and corresponding utilitiesareillustratedinthegamematrixinTable1.2. Insuchagame,therearetwopossibleBayesianNashequilibria,whiletheirexis- tencedependsonthevalueofX. Pooling equilibrium: When X = 8, both Sane and Crazy Firm 1 will choose to prey.Insuchacase,Firm2hasnowaytodistinguishbetweenthesetwotypes.Given thedistributionofthetype,Firm2willchoosetoexitinsteadofstay. Separatingequilibrium:WhenX=2,SaneFirm1willaccommodateandCrazy Firm 1 will prey. Firm 2 will stay when seeing accommodate and exit when seeing prey.Inthisequilibrium,Firm2canjudgeFirm1’stypethroughtheobservedaction. Inotherwords,Firm1’sactioncanbetreatedasasignaltoimprovetheestimationof theunknowntypes.Therulesofsignalinginincompleteinformationwillbediscussed indetailinPartIII. 1.4 RepeatedGamesandStochasticGames Thepreviousgamemodelsarebasedontheassumptionthatthegamewillbeplayed onlyonce(i.e.,aone-shotgame).Inpractice,agentsmayfacethesameproblemmul- tipletimes,eachtimewiththesameordifferentplayers.Insuchascenario,wemay formulatetheproblemasarepeatedgame.Arepeatedgameconsistsofaseriesofbase gamesinwhichtheplayersplaythesamebasegamesequentially.Itcanbewrittenas anextensive-formgamebyexpandingthebasegameinagametreerepeatedly. Therearetwokindsofrepeatedgames:finiteandinfiniterepeatedgames.Forfinite repeatedgames,thenumberofroundsofthebasegameisfinite.Thissuggeststhatan ending base game exists, and the game tree in the extensive form is finite. In such a scenario,thegamecanbeanalyzeddirectlywithintheconceptofasubgame-perfect Nashequilibrium.Fortheothercase,wheretheroundsareinfinite,thereisnoending gameandthereforeasubgame-perfectNashequilibriumcannotbeapplieddirectly. TheutilityofPlayeri inarepeatedgamecanbewrittenasfollows: (cid:2)T Ui = lim δtui(ai(t),a−i(t)), (1.1) T→∞ t=0 where ai(t) and a−i(t) are the actions selected by Player i and the other players at roundt,respectively,and0< δ <1isthediscountfactorforevaluatingtheutilityin thefuturetotheplayersinthepresent. 6 BasicGameTheory Takingtheprisoner’sdilemmaasanexample,wemayconsiderarepeatedversion oftheprisoner’sdilemma,whichiscalledtheiteratedprisoner’sdilemma.Whenthe rounds are finite, it can be easily shown that the original {Betray,Betray} equi- librium still holds as the unique Nash equilibrium of the game. Nevertheless, when the rounds are infinite, the equilibrium becomes nonunique as cooperation between playersbecomespossible.Forinstance,atit-for-tatstrategy(i.e.,staysilentinthefirst round and then choose the action selected by the opponent in the previous round) is also a Nash equilibrium, since any deviation from such a strategy will lead to {Betray,Betray}ineveryround,whilecontinuingtousethestrategywillkeepthem at{StaySilent,StaySilent}andeventuallyleadtohigherutilityinthelongrun.This examplesuggeststhatcooperationislikelytoemergeinarepeatedgame.Readerswill findmorerelatedexamplesinPartI. In practice, even if the agent is facing the same problem and applying the same action multiple times, the results could be different due to external uncertainty or randomnessinthesystem.Wemaycapturethischaracteristicwithstochasticgames, whicharerepeatedgameswithuncertainty. The uncertainty is captured through adding a state θ ∈ (cid:2) to the game. The state isobservablebytheplayersbutitmaychangeinthenextroundwiththeprobability described by P(θ(cid:6)|θ,a), which depends on the current state and the action profiles selected by the players. The utility of the players depends on not only the action profile, but also the state. Therefore, the utility of Player i in the stochastic game canbewrittenasfollows: Ui = lim ui(θ(0),ai(0),a−i(0)) T→∞ (cid:2)T + P(θ(t)|θ(t −1),a(t −1))δtui(θ(t),ai(t),a−i(t)). (1.2) t=1 Giventhatthestatetransitionsdependonthepreviousstateandtheactionprofile, thesystemcanbeformulatedasaMarkovdecisionprocesswhentheactionprofileis given,whichhelpsustoderivetheexpectedutilitygivencertainactionprofiles.Then, a Bayesian Nash equilibrium can be applied to derive the equilibrium of the game. ReaderswillfindmoreexamplesinPartsIIandIII. 5 Indirect Reciprocity Data Fusion Game and Application to Cooperative Spectrum Sensing Encouraging sensors to share their data is a crucial issue due to the importance of data sharing in data fusion. In this chapter, we discuss a reputation-based incentive framework in which an indirect reciprocity game is used to model the data sharing stimulation problem. Within this framework, sensors decide on how to report their resultstothefusioncenter(FC)andearnareputation.Onthebasisofthisreputation, they are capable of gaining benefits in the future. For accuracy of fusion and sens- ing,reputationdistributionisintroducedwithinthegame,inwhichthegame’sNash equilibrium(NE)isderivedandthecorrespondinguniquenessisproventheoretically. Furthermore,theschemeisappliedtocooperativespectrumsensing.Itisshownthat with an appropriate cost-to-gain ratio, when the average received energy exceeds a given threshold, the optimal strategy for the secondary users (SUs) is to report; otherwise, they would remain silence. It is also verified that this kind of optimal strategyisadesirableevolutionarilystablestrategy(ESS). 5.1 Introduction Today, in the big data era, with diversified measurement techniques in various domains, information about a system of interest or a phenomenon can be acquired through different types of sensors (as well as data suppliers). In order to extract knowledge for various purposes and analyze information from multiple sensors jointly, concepts of data fusion were developed and corresponding techniques were introduced [1]. The data fusion process’s analytic outcomes enable users to acquire acomprehensive view and amore united picture of the system, make more accurate decisions,andanswerquestionsaboutthesystem. Data fusion methodologies were first created for military applications and then evolved intononmilitary fields suchas driver assistancesystems[2],cognitive radio networks(CRNs)[3],andsmartgrids[4].Numerousalgorithmshavebeenproposed to exploit the diversity of multiple sensors effectively, including those based on soft fusionandhardfusion.Withinthesoftfusion-basedalgorithms,aquantizedversionof alocaldecisionstatisticsuchasanysuitablysufficientstatisticsorthelog-likelihood ratiowouldbesenttotheFC,whilewithinthehardfusion-basedalgorithms,aone-bit hardlocaldecisionwouldbesentbyeverysensortotheFC. 5.1 Introduction 81 It is verified that the existing fusion algorithms could improve system reliability, butmostofthemassumethateverysensorisaltruisticandwillingtounconditionally share data for fusion, which may not be true in reality. Actually, if cooperation is not capable of bringing benefits to sensors on account of the extra energy and cost requiredincollectingandreportingdata,thesensorsmightnotchoosethecooperation. Accordingly,thefusionalgorithmswouldnotrunproperlyduetoalackofsufficient data.Therefore,stimulatingdifferentdataproviderstocollaborateandsharetheirdata isacrucialtopic. As for cooperation stimulation schemes, reputation-based and payment-based schemes are two most common types. Reputation-based cooperation stimulation schemes have been broadly discussed within wireless data networks [5], ad hoc networks [6], and peer-to-peer (P2P) networks [7] to guarantee trustworthy commu- nications,orinCRNs[8]andwirelesssensornetworks[9]toguaranteedatafusion. However, few reputation-based schemes have been developed in order to stimulate sensors to share their sensing results until recently, and this kind of approach has no theoretical justification yet. Payment-based schemes such as auctions and virtual currencieshavebeenproposedtoguaranteeparticipatorysensing[10]withinwireless communicationsystems,filesharingwithinP2Pnetworks[11],anddynamicspectrum sharing within CRNs [12]. Though these schemes have produced promising results, they are applied with limitations such as central banking server(s) for the sake of securitycontrolortherequirementoftamper-proofhardware. Another way of analyzing the cooperation problem is game theory. Specifically, a number of cooperation schemes on the basis of cooperative game theory such as the coalition game [13,14] and the bargaining game [15,16] have been introduced. The focus of cooperative game-theoretic frameworks is allocating and acquiring net incomewithinanalliance.Incooperativegames,playershaveaninterestinmaximiz- ing their outcome, though when considering the benefits on both sides, they might want to accept a solution that involve bargaining. Thus, a stimulation mechanism or an enforceable contract is needed to guarantee cooperation. Furthermore, most game-theoretic frameworks arebasedonthedirectreciprocitymodel inwhichother players would not evaluate all players’ behaviors, but only their opponents’ behav- iors.Accordingtothebackwardinductionprincipleandtheprisoner’sdilemma,two directlyparticipatingplayerswouldcooperateonlywhenrepeatingthegameinfinite times. However, players could alter their partners periodically to obtain better per- formance due to mobility or changes of environment; thus, for them, playing non- cooperatively is the only optimal strategy. In this case, they also need a stimulation mechanism. Indirectreciprocityhasbeenbroadlyadoptedwithinsocialscienceandevolution- ary biology [17,18] and has recently drawn a lot of attention in applications such as energyexchange[19],packageforwarding[20]andcooperativetransmission[21,22]. Its concept is “I help you not because you have helped me but because you have helped others,” which means that the current donor could give help to the recipient sinceithelpedothersasadonorbefore.Thissignifiesthattheevaluationsfromboth other observers and opponents would be taken into consideration, and this endows 82 IndirectReciprocityDataFusionGameandApplicationtoCooperativeSpectrumSensing playerswiththeincentivetocooperateeventhoughthegamewouldnotbereplayed limitlessly. Within the chapter, first, we discuss a framework that achieves successful data fusion through incorporating an incentive mechanism on the basis of indirect reci- procityandstimulatingcooperationamongsensorsthatarenotrequiredtoplaywith the same group of sensors all of the time. Then, the application to the cooperative spectrumsensing(CSS)systemwouldbediscussed. 5.2 IndirectReciprocityDataFusionGame 5.2.1 SystemModel AdatafusionsystemisconsideredasshowninFigure5.1,wherethereareM sensors sensingandobtaininglocalobservations.Itisassumedthatthelocalobservationsare obtainedforclassichypothesistests.Presumethatthehypothesescouldbeevaluated fromtheobservationofwhichsensorreceivesitssignalenergy(i.e.,energydetection). (cid:3) Let S = 1 N |r (t)|2 denote the signal’s average energy that is received by m N t=1 m sensor m, where N represents the signal sample’ number and r (t) represents the m sampleofobservedsignalsatsensorm’sreceiver,whichcouldbesignifiedas (cid:26) h s(t)+n (t), ifH , r (t)= m m 1 (5.1) m n (t), ifH . m 0 Figure5.1 Thesystemmodel.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.