ebook img

Robustness of Structurally Equivalent Concurrent Parity Games PDF

0.24 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Robustness of Structurally Equivalent Concurrent Parity Games

Robustness of Structurally Equivalent ⋆ Concurrent Parity Games KrishnenduChatterjee ISTAustria(InstituteofScienceandTechnologyAustria) 2 1 Abstract. Weconsidertwo-playerstochasticgamesplayedonafinitestatespace 0 foraninfinitenumber ofrounds. Thegamesareconcurrent: ineachround,the 2 two players (player 1 and player 2) choose their moves independently and si- n multaneously;thecurrentstateandthetwomovesdetermineaprobabilitydistri- a butionoverthesuccessor states.Wealsoconsidertheimportantspecialcaseof J turn-basedstochasticgameswhereplayersmakemovesinturns,ratherthancon- 3 currently. Westudy concurrent games withω-regular winning conditions spec- ified as parity objectives. The value for player 1 for a parity objective is the ] T maximalprobabilitywithwhichtheplayercanguaranteethesatisfactionofthe G objectiveagainstallstrategiesoftheopponent. Westudytheproblemofconti- nuityandrobustnessofthevaluefunctioninconcurrentandturn-basedstochas- . s tic parity games with respect to imprecision in the transition probabilities. We c present quantitativebounds onthedifferenceofthevaluefunction(intermsof [ theimprecisionofthetransitionprobabilities)andshowthevaluecontinuityfor 2 structurallyequivalent concurrentgames(twogamesarestructurallyequivalent v ifthesupportsofthetransitionfunctionsarethesameandtheprobabilitiesdif- 9 fer). We also show robustness of optimal strategies for structurally equivalent 0 turn-based stochastic parity games. Finally, we show that the value continuity 0 propertybreakswithoutthestructuralequivalenceassumption(evenforMarkov 2 chains) and show that our quantitative bound isasymptotically optimal. Hence . 7 ourresultsaretight(theassumptionisbothnecessaryandsufficient)andoptimal 0 (ourquantitativeboundisasymptoticallyoptimal). 1 1 : 1 Introduction v i X Concurrent stochastic games are played by two players on a finite state space for an r infinite number of rounds. In every round, the two players simultaneously and inde- a pendentlychoosemoves(oractions),andthe currentstate and the two chosenmoves determineaprobabilitydistributionoverthesuccessorstates.Theoutcomeofthegame (or a play) is an infinite sequence of states. These games were introduced by Shap- ley [24], and have been one of the most fundamental and well studied game models instochasticgraphgames.We considerω-regularobjectivesspecifiedasparityobjec- tives;thatis, givenan ω-regularsetΦ of infinite state sequences,player1 winsif the outcomeofthegameliesinΦ.Otherwise,player2wins,i.e.,thegameiszero-sum.The classofconcurrentstochasticgamessubsumesmanyotherimportantclassesofgames ⋆TheresearchwassupportedbyAustrianScienceFund(FWF)GrantNoP23499-N23,FWF NFNGrantNoS11407-N23(RiSE),ERCStartgrant(279307:GraphGames),andMicrosoft facultyfellowsaward. assub-classes:(1)turn-basedstochasticgames,whereineveryroundonlyoneplayer chooses moves(i.e., the players make movesin turns); and (2) Markov decision pro- cesses (MDPs) (one-player stochastic games). Concurrent games and the sub-classes provide a rich framework to model various classes of dynamic reactive systems, and ω-regularobjectivesprovidea robustspecification languageto expressall commonly used properties in verification, and all ω-regular objectives can be expressed as par- ityobjectives.Thusconcurrentgameswithparityobjectivesprovidethemathematical frameworktostudymanyimportantproblemsinthesynthesisandverificationofreac- tivesystems[6,23,21](seealso[1,14,2]). Theplayer-1valuev (s)ofthegameatastatesisthelimitprobabilitywithwhich 1 player1 can ensurethat the outcomeof the gamelies in Φ; thatis, the value v (s) is 1 the maximalprobabilitywith whichplayer1 can guaranteeΦ againstall strategiesof player 2. Symmetrically, the player-2 value v (s) is the limit probability with which 2 player2canensurethattheoutcomeofthegameliesoutsideΦ.Theproblemofstudy- ingthecomputationalcomplexityofMDPs,turn-basedstochasticgames,andconcur- rent gameswith parity objectiveshas receiveda lot of attention in literature. Markov decision processes with ω-regular objectives have been studied in [8,9,4] and the re- sultsshowexistenceofpure(deterministic)memoryless(stationary)optimalstrategies forparityobjectivesandtheproblemofvaluecomputationisachievableinpolynomial time.Turn-basedstochasticgameswiththespecialcaseofreachabilityobjectiveshave been studied in [7] and existence of pure memorylessoptimalstrategies has been es- tablished and the decision problem of whether the value at a state is at least a given rational value lies in NP ∩ coNP. The existence of pure memoryless optimal strate- gies for turn-based stochastic games with parity objectives was established in [5,28], andagainthedecisionproblemliesinNP∩coNP.Concurrentparitygameshavebeen studiedin[10,12,3,15]andforconcurrentparitygamesoptimalstrategiesneednotex- ist,andε-optimalstrategies(forε>0)requirebothinfinitememoryandrandomization ingeneral,andthedecisionproblemcanbesolvedinPSPACE. Almost all results in the literature consider the problem of computing values and optimal strategies when the game model is given precisely along with the objective. However, it is often unrealistic to know the precise probabilities of transition which areonlyestimatedthroughobservation.Sincethetransitionprobabilitiesarenotknown precisely,an extremelyimportantquestionishowrobustisthe analysisofconcurrent games and its sub-classes with parity objectives with respect to small changes in the transition probabilities. This question has been largely ignored in the study of con- current and turn-based stochastic parity games. In this paper we study the following problemsrelatedtocontinuityandrobustnessofvalues:(1)(continuityofvalues):un- derwhatconditionscancontinuityofthevaluefunctionbeprovedforconcurrentparity games; (2) (robustness of values): can quantitative bounds be obtained on the differ- enceofthevaluefunctionintermsofthedifferenceofthetransitionprobabilities;and (3)(robustnessofoptimalstrategies):dooptimalstrategiesofagameremainε-optimal, forε>0,ifthetransitionprobabilitiesareslightlychanged. Ourcontributions.Ourcontributionsareasfollows: 1. Weconsiderstructurallyequivalentgamestructures,wherethesupportsofthetran- sitionprobabilitiesarethesame,buttheprecisetransitionprobabilitiesmaydiffer. Weshowthefollowingresultsforstructurallyequivalentconcurrentparitygames: (a) Quantitativebound.We presenta quantitativeboundon the differenceof the valuefunctionsoftwo structurallyequivalentgamestructuresin termsofthe differenceof the transitionprobabilities.We show whenthe differencein the transitionprobabilitiesaresmall,ourboundisasymptoticallyoptimal.Ourex- ampleto showthematchinglowerboundisona Markovchain,andthusour resultshowsthattheboundforaMarkovchaincanbegeneralizedtoconcur- rentgames. (b) Valuecontinuity.Weshowvaluecontinuityforstructurallyequivalentconcur- rentparitygames,i.e.,asthedifferenceinthetransitionprobabilitiesgoesto0, thedifferenceinvaluefunctionsalsogoesto0.Wethenshowthatthestructural equivalenceassumptionisnecessary:weshowafamilyofMarkovchains(that arenotstructurallyequivalent)wherethedifferenceofthetransitionprobabil- itiesgoesto0,butthedifferenceinthevaluefunctionsis1.Itfollowsthatthe structuralequivalenceassumptionisbothnecessary(evenforMarkovchains) andsufficient(evenforconcurrentgames). Itfollowsfromabovethatourresultsarebothoptimal(quantitativebounds)aswell astight(assumptionbothnecessaryandsufficient).Ourresultforconcurrentpar- itygamesisalsoasignificantquantitativegeneralizationofaresultforconcurrent paritygamesof[10]whichshowsthatthesetofstateswithvalue1remainssame if the games are structurally equivalent. We also argue that the structural equiv- alence assumption is not unrealistic in many cases: a reactive system consists of manystatevariables,andgivenastate(valuationofvariables)itistypicallyknown whichvariablesarepossiblyupdated,andwhatisunknownistheprecisetransition probabilities(whichareestimatedbyobservation).Thusthesystemthatisobtained foranalysisisstructurallyequivalenttotheunderlyingoriginalsystemanditonly differsinprecisetransitionprobabilities. 2. For turn-based stochastic parity games the value continuity and the quantitative boundsaresameasforconcurrentgames.Wealsoproveastrongerresultforstruc- turallyequivalentturn-basedstochasticgamesthatshowsthatalongwithcontinuity of the valuefunction,there is also robustnesspropertyforpurememorylessopti- malstrategies.Moreprecisely,forallε > 0,wepresentaboundβ > 0,suchthat anypurememorylessoptimalstrategyinaturn-basedstochasticparitygameisan ε-optimalstrategyineverystructurallyequivalentturn-basedstochasticgamesuch thatthetransitionprobabilitiesdifferbyatmostβ.Ourresulthasdeepsignificance asitallowstherichliteratureofworkonturn-basedstochasticgamestocarryover robustlyforstructurallyequivalentturn-basedstochasticgames.Asarguedbefore the model of turn-based stochastic game obtained to analyze may differ slightly in precise transition probabilities, and our results shows that the analysis on the slightly imprecise model using the classical results carry over to the underlying originalsystemwithsmallerrorbounds. Ourresultsareobtainedasfollows.Theresultof[11]showsthatthevaluefunctionfor concurrentparitygamescanbecharacterizedasthelimitofthevaluefunctionofcon- currentmulti-discountedgames(concurrentdiscountedgameswith differentdiscount factorsassociatedwitheverystate).Thereexistsboundondifferenceonvaluefunction ofdiscountedgames[16],however,thebounddependsonthediscountfactor,andinthe limitgivestrivialbounds(andingeneralthisapproachdoesnotworkasvaluecontinu- itycannotbeproveningeneralandthestructuralequivalenceassumptionisnecessary). WeuseaclassicalresultonMarkovchainsbyFriedlinandWentzell[17]andgeneralize aresultofSolan[25]fromMarkovchainswithsinglediscounttoMarkovchainswith multi-discountedobjectivetoobtainaboundthatisindependentofthediscountfactor forstructurallyequivalentgames.Thentheboundalsoapplieswhenwetakethelimit ofthediscountfactors,andgivesusthedesiredbound. Ourpaperisorganizedasfollows:inSection2wepresentthebasicdefinitions,in Section 3 we consider Markov chains with multi-discountedand parity objectives;in Section4(Subsection4.1)weprovetheresultsrelatedtoturn-basedstochasticgames (item(2)ofourcontributions)andfinallyinSubsection4.2wepresentthequantitative boundandvaluecontinuityforconcurrentgamesalongwiththetwoexamplestoillus- tratetheasymptoticoptimalityoftheboundandthestructuralequivalenceassumption isnecessary.Detailedproofsarepresentedintheappendix. 2 Definitions Inthissectionwedefinegamestructures,strategies,objectives,valuesandpresentother preliminarydefinitions. Probability distributions. For a finite set A, a probability distribution on A is a function δ : A 7→ [0,1] such that δ(a) = 1. We denote the set of prob- a∈A ability distributions on A by D(A). Given a distribution δ ∈ D(A), we denote by P Supp(δ)={x∈A|δ(x)>0}thesupportofthedistributionδ. Concurrent game structures. A (two-player) concurrent stochastic game structure G=hS,A,Γ ,Γ ,δiconsistsofthefollowingcomponents. 1 2 – AfinitestatespaceS andafinitesetAofmoves(oractions). – Two move assignments Γ ,Γ : S 7→ 2A \∅. For i ∈ {1,2}, assignment Γ as- 1 2 i sociateswitheachstates ∈ S thenonemptysetΓ (s) ⊆ Aofmovesavailableto i playeriatstates. – Aprobabilistictransitionfunctionδ: S ×A×A 7→ D(S),whichassociateswith everystates∈S andmovesa ∈Γ (s)anda ∈Γ (s)aprobabilitydistribution 1 1 2 2 δ(s,a ,a )∈D(S)forthesuccessorstate. 1 2 Plays. At every state s ∈ S, player 1 chooses a move a ∈ Γ (s), and simultane- 1 1 ouslyandindependentlyplayer2choosesamovea ∈Γ (s).Thegamethenproceeds 2 2 to the successor state t with probability δ(s,a ,a )(t), for all t ∈ S. For all states 1 2 s ∈ S and moves a ∈ Γ (s) and a ∈ Γ (s), we indicate by Dest(s,a ,a ) = 1 1 2 2 1 2 Supp(δ(s,a ,a ))thesetofpossiblesuccessorsofswhenmovesa ,a areselected. 1 2 1 2 A path or a play of G is an infinite sequence ω = hs ,s ,s ,...i of states in S 0 1 2 such that for all k ≥ 0, there are moves ak ∈ Γ (s ) and ak ∈ Γ (s ) such that 1 1 k 2 2 k s ∈Dest(s ,ak,ak).WedenotebyΩthesetofallpaths.Wedenotebyθ theran- k+1 k 1 2 i domvariablethatdenotesthei-thstateofapath.Foraplayω = hs ,s ,s ,...i ∈ Ω, 0 1 2 wedefineInf(ω) = {s ∈ S | s =sforinfinitelymanyk ≥0}tobethesetofstates k thatoccurinfinitelyofteninω. Special classes of games. We consider the following special classes of concurrent games. 1. Turn-based stochastic games. A game structure G is turn-based stochastic if at everystateatmostoneplayercanchooseamongmultiplemoves;thatis,forevery states∈S thereexistsatmostonei∈{1,2}with|Γ (s)|>1. i 2. Markovdecisionprocesses.Agamestructureisaplayer-1Markovdecisionprocess (MDP)ifforalls∈Swehave|Γ (s)|=1,i.e.,onlyplayer1haschoiceofactions 2 inthegame.Similarly,agamestructureisaplayer-2MDPifforalls∈S wehave |Γ (s)|=1. 1 3. Markov chains. A game structure is a Markov chain if for all s ∈ S we have |Γ (s)| = 1and|Γ (s)| = 1.HenceinaMarkovchaintheplayersdonotmatter, 1 2 and for the rest of the paper a Markov chain consists of a tuple (S,δ) where δ : S 7→D(S)istheprobabilistictransitionfunction. Strategies. A strategy for a player is a recipe that describes how to extend a play. Formally,astrategyforplayeri∈{1,2}isamappingπ :S+ 7→D(A)thatassociates i witheverynonemptyfinitesequencex ∈S+ ofstates,representingthepasthistoryof thegame,aprobabilitydistributionπ (x)usedtoselectthenextmove.Thestrategyπ i i canprescribeonlymovesthatareavailabletoplayeri;thatis,forallsequencesx∈S∗ andstatess∈S,werequirethatSupp(π (x·s))⊆Γ (s).WedenotebyΠ thesetof i i i allstrategiesforplayeri∈{1,2}. Given a state s ∈ S and two strategies π ∈ Π and π ∈ Π , we define 1 1 2 2 Outcome(s,π ,π ) ⊆ Ω to be the set of paths that can be followed by the game, 1 2 when the game starts from s and the players use the strategies π and π . Formally, 1 2 hs ,s ,s ,...i ∈ Outcome(s,π ,π ) if s = s and if for all k ≥ 0 there ex- 0 1 2 1 2 0 ist moves ak ∈ Γ (s ) and ak ∈ Γ (s ) such that (i) π (s ,...,s )(ak) > 0; 1 1 k 2 2 k 1 0 k 1 (ii)π (s ,...,s )(ak) > 0;and(iii)s ∈ Dest(s ,ak,ak).Oncethestartingstate 2 0 k 2 k+1 k 1 2 sandthestrategiesπ andπ forthetwoplayershavebeenchosen,theprobabilitiesof 1 2 eventsareuniquelydefined[27],whereaneventA ⊆ Ω isameasurablesetofpaths1. ForaneventA⊆Ω,wedenotebyPrπ1,π2(A)theprobabilitythatapathbelongstoA s whenthegamestartsfromsandtheplayersusethestrategiesπ andπ . 1 2 Classificationofstrategies.Weconsiderthefollowingspecialclassesofstrategies. 1. (Pure). A strategy π is pure (deterministic) if for all x ∈ S+ there exists a ∈ A such that π(x)(a) = 1. Thus, deterministic strategies are equivalentto functions S+ 7→A. 2. (Finite-memory). Strategies in general are history-dependent and can be repre- sented as follows: let M be a set called memory to remember the history of plays (thesetMcanbeinfiniteingeneral).Astrategywithmemorycanbedescribedas a pair of functions: (a) a memory update function π : S × M 7→ M, that given u thememoryMwiththeinformationaboutthehistoryandthecurrentstateupdates the memory; and (b) a next move function π : S × M 7→ D(A) that given the n 1Tobe precise, we should define events as measurable setsof paths sharing the sameinitial state,andweshouldreplaceoureventswithfamiliesofevents,indexedbytheirinitialstate. However,our(slightly)improperdefinitionleadstomoreconcisenotation. memory and the currentstate specifies the next moveof the player. A strategy is finite-memoryifthememoryMisfinite. 3. (Memoryless). A memoryless strategy is independent of the history of play and onlydependsonthecurrentstate.Formally,foramemorylessstrategyπ wehave π(x ·s) = π(s) for all s ∈ S and all x ∈ S∗. Thus memoryless strategies are equivalenttofunctionsS 7→D(A). 4. (Purememoryless).Astrategyispurememorylessifitisbothpureandmemoryless. Pure memoryless strategies neither use memory, nor use randomization and are equivalenttofunctionsS 7→A. Qualitativeobjectives.We specifyqualitativeobjectivesfortheplayersbyproviding thesetofwinningplaysΦ ⊆ Ω foreachplayer.Inthispaperwestudyonlyzero-sum games[22,16],wheretheobjectivesofthe twoplayersarecomplementary.A general classofobjectivesaretheBorelobjectives[19].ABorelobjectiveΦ⊆SωisaBorelset intheCantortopologyonSω.Inthispaperweconsiderω-regularobjectives,whichlie inthefirst21/ levelsoftheBorelhierarchy(i.e.,intheintersectionofΣ andΠ )[26]. 2 3 3 Allω-regularobjectivescanbespecifiedasparityobjectives,andhenceinthisworkwe focusonparityobjectives,andtheyaredefinedasfollows. – Parityobjectives.Forc,d∈N,welet[c..d]={c,c+1,...,d}.Letp:S 7→[0..d] bea functionthatassignsa priorityp(s) toeverystate s ∈ S,whered ∈ N. The Even parity objective requires that the minimum priority visited infinitely often is even. Formally, the set of winning plays is defined as Parity(p) = {ω ∈ Ω | min p(Inf(ω)) iseven}. Quantitativeobjectives.Quantitativeobjectivesaremeasurablefunctionsf :Ω 7→R. (cid:0) (cid:1) We will consider multi-discounted objective functions, as there is a close connection establishedbetweenconcurrentgameswithmulti-discountedobjectivesandconcurrent games with parity objectives. Given a concurrent game structure with state space S, let λ be a discountvector that assigns for all s ∈ S a discountfactor 0 < λ(s) < 1 (unlessotherwisementionedwewillalwaysconsiderdiscountvectorsλsuchthatfor alls ∈ S wehave0 < λ(s) < 1).Letr : S 7→ Rbearewardfunctionthatassignsa real-valuedrewardr(s) toeverystate s ∈ S.Themulti-discountedobjectivefunction MDT(λ,r) : Ω 7→ R maps every path to the mean-discounted reward of the path. Formally,thefunctionisdefinedasfollows:forapathω =s s s ...wehave 0 1 2 ∞ ( j λ(s ))·r(s ) MDT(λ,r)(ω)= j=0 i=0 i j . P ∞j=Q0( ji=0λ(si)) AlsonotethataparityobjectiveΦcanbePintepreQtedasafunctionΦ : Ω 7→ {0,1}by simply consideringthe characteristic functionthatassigns1 to pathsthatbelongto Φ and0otherwise. Values, optimality, ε-optimality. Given an objective Φ which is a measurable func- tion Φ : Ω 7→ R, we define the value for player 1 of game G with objective Φ from the state s ∈ S as Val(G,Φ)(s) = supπ1∈Π1infπ2∈Π2Eπs1,π2(Φ); i.e., the value is the maximal expectation with which player 1 can guarantee the satisfaction of Φ against all player 2 strategies. Given a player-1 strategy π , we use the nota- 1 tion Valπ1(G,Φ)(s) = infπ2∈Π2Eπs1,π2(Φ). A strategy π1 for player 1 is optimal for an objective Φ if for all states s ∈ S, we have Valπ1(G,Φ)(s) = Val(G,Φ)(s). For ε > 0, a strategy π for player 1 is ε-optimal if for all states s ∈ S, we have 1 Valπ1(G,Φ)(s)≥Val(G,Φ)(s)−ε.Thenotionofvalues,optimalandε-optimalstrate- giesforplayer2aredefinedanalogously.Thefollowingtheoremsummarizestheresults in literature related to determinacyand memorycomplexityof concurrentgamesand itssub-classesforparityandmulti-discountedobjectives. Theorem1. Thefollowingassertionshold: 1. (Determinacy [20]). For all concurrent game structures and for all parity and multi-discounted objectives Φ we have supπ1∈Π1infπ2∈Π2Eπs1,π2(Φ) = infπ2∈Π2supπ1∈Π1Eπs1,π2(Φ). 2. (Memory complexity). For all concurrent game structures and for all multi- discountedobjectivesΦ,randomizedmemorylessoptimalstrategiesexist[24].For allturn-basedstochasticgamestructuresandforallmulti-discountedobjectivesΦ, purememorylessoptimalstrategiesexist[16].Forallturn-basedstochasticgame strucuturesandforallparityobjectivesΦ,purememorylessoptimalstrategiesex- ist [5,28]. In general optimal strategies need not exist in concurrent games with parityobjectives,andε-optimalstrategies,forε>0,needbothrandomizationand infinitememoryingeneral[10]. Theresultsof[11]establishedthatthevalueofconcurrentgameswithcertainspe- cial multi-discountedobjectivescan be characterizedas valuationsof quantitaivedis- countedµ-calculusformula.Inthelimit,thevaluefunctionofthediscountedµ-calculus formulacharacterizesthevaluefunctionofconcurrentgameswithparityobjectives.An elegantinterpretationoftheresultwasgivenin[18],andfromtheinterpretationweob- tainthefollowingtheorem. Theorem2 ([11,18]).LetGbeaconcurrentgamestructurewithaparityobjectiveΦ definedbyapriorityfunctionp.Letrbearewardfunctionthatassignsreward1toeven prioritystatesandreward0tooddprioritystates.Thenthereexistsanorders s ...s 1 2 n on the states (where S = {s ,s ,...,s }) dependent only on the priority function 1 2 n p such that Val(G,Φ) = lim lim ...lim Val(G,MDT(λ,r)); λ(s1)→1 λ(s2)→1 λ(sn)→1 in other words, if we consider the value function Val(G,MDT(λ,r)) with the multi- discountedobjective andtake the limit ofthe discountfactors to 1 in the orderofthe statesweobtainthevaluefunctionfortheparityobjective. We now present notions related to structure equivalent game structures and dis- tances. Structure equivalent game structures. Given two game structures G = 1 hS,A,Γ ,Γ ,δ i and G = hS,A,Γ ,Γ ,δ i on the same state and action space, 1 2 1 2 1 2 2 with different transition function, we say that G and G are structure equivalent 1 2 (denoted G ≡ G ) if for all s ∈ S and all a ∈ Γ (s) and a ∈ Γ (s) we 1 2 1 1 2 2 haveSupp(δ (s,a ,a )) = Supp(δ (s,a ,a )).Similarly,twoMarkovchainsG = 1 1 2 2 1 2 1 (S,δ )andG =(S,δ )arestructurallyequivalent(denotedG ≡G )ifforalls∈S 1 2 2 1 2 we have Supp(δ (s)) = Supp(δ (s)). For a game structure G (resp. Markov chain 1 2 G), we denote by [[G]] the set of all game structures (resp. Markov chains) that are ≡ structurallyequivalenttoG. Ratio and absolute distances. Given two game structures G = hS,A,Γ ,Γ ,δ i 1 1 2 1 and G = hS,A,Γ ,Γ ,δ i, the absolute distance of the game structures is maxi- 2 1 2 2 mum absolute difference in the transition probabilities. Formally, dist (G ,G ) = A 1 2 max |δ (s,a,b)(t) − δ (s,a,b)(t)|. The absolute distance for s,t∈S,a∈Γ1(s),b∈Γ2(s) 1 2 two Markov chains G = (S,δ ) and G = (S,δ ) is dist (G ,G ) = 1 1 2 2 A 1 2 max |δ (s)(t)−δ (s)(t)|. We now define the ratio distance between two struc- s,t∈S 1 2 turally equivalent game structures and Markov chains. Let G and G be two struc- 1 2 turallyequivalentgamestructures.Theratiodistanceisdefinedontheratioofthetran- sitionprobabilities.Formally, δ (s,a,b)(t) δ (s,a,b)(t) dist (G ,G )=max 1 , 2 |s∈S,a∈Γ (s),b∈Γ (s), R 1 2 1 2 δ (s,a,b)(t) δ (s,a,b)(t) ( 2 1 t∈Supp(δ (s,a,b))=Supp(δ (s,a,b)) −1 1 2 ) The ratio distance between two structurally equivalent Markov chains G and G is 1 2 max δ1(s)(t),δ2(s)(t) |s∈S,t∈Supp(δ (s))=Supp(δ (s)) −1. δ2(s)(t) δ1(s)(t) 1 2 Remarks about the distance functions. We first remark that the ratio distance is (cid:8) (cid:9) not necessarily a metric. Consider the Markov chain with state space S = {s,s′} and let ε ∈ (0,1/7). For k = 1,2,5 consider the transition functions δ such that k δ (t)(s) = 1−δ (t)(s′) = k · ε, for all t ∈ S. Let G be the Markov chain with k k k transition function δ . Then we have dist (G ,G ) = 1,dist (G ,G ) = 3 and k R 1 2 R 2 5 2 dist (G ,G ) = 4, and hence dist (G ,G ) + dist (G ,G ) < dist (G ,G ). R 1 5 R 1 2 R 2 5 R 1 5 The aboveexampleis from[25]. Also note thatdist is only definedfor structurally R equivalent game structures, and without the assumption dist is ∞. We also remark R thattheabsolutedistancethatmeasuresthedifferenceinthetransitionprobabilitiesis themostintuitivemeasureforthedifferenceoftwogamestructures. Proposition1. LetG beagamestructure(resp.Markovchain)suchthattheminimum 1 positivetransitionprobabilityisη > 0.Forallgamestructures(resp.Markovchains) G ∈[[G ]] wehavedist (G ,G )≤ distA(G1,G2). 2 1 ≡ R 1 2 η Notation for fixing strategies. Given a concurrent game structure G = hS,A,Γ ,Γ ,δi, let π be a randomizedmemoryless strategy. Fixing the strategy π 1 2 1 1 in G we obtain a player-2 MDP, denoted as G ↾ π , defined as follows: (1) the state 1 spaceisS;(2)foralls ∈ S we haveΓ (s) = {⊥}(henceitisa player-2MDP);(3) 1 the new transitionfunctionδ is defined asfollows:forall s ∈ S andall b ∈ Γ (s) π1 2 wehaveδ (s,⊥,b)(t) = π (s)(a)·δ(s,a,b)(t).Similarlyifwefixaran- π1 a∈Γ1(s) 1 domizedmemorylessstrategyπ inanMDPGweobtainaMarkovchain,denotedas 1 P G↾π .Thefollowingpropositionisstraightforwardtoverifyfromthedefinitions. 1 Proposition2. Let G and G be two concurrent game structures (resp. MDPs) 1 2 that are structurally equivalent. Let π be a randomized memoryless strategy. Then 1 dist (G ↾ π ,G ↾ π ) ≤ dist (G ,G ) and dist (G ↾ π ,G ↾ π ) ≤ A 1 1 2 1 A 1 2 R 1 1 2 1 dist (G ,G ). R 1 2 3 Markov Chains withMulti-discounted andParity Objectives InthissectionweconsiderMarkovchainswithmulti-discountedandparityobjectives. Wepresentaboundonthedifferenceofvaluefunctionsoftwostructurallyequivalent MarkovchainsthatisdependentonthedistancebetweentheMarkovchainsandisin- dependentofthediscountfactors.Theresultforparityobjectivesisthenaconsequence of our result for multi-discounted objectives and Theorem 2. Our result crucially de- pendsonaresultofFriedlinandWentzellforMarkovchainsandwepresentthisresult below,andthenuseittoobtainthemainresultofthesection. ResultofFriedlinandWentzell.Let(S,δ)beaMarkovchainandlets betheinitial 0 state. Let C ⊂ S be a proper subset of S and let us denote by ex = inf{n ∈ N | C θ 6∈ C} the first hitting time to the set S \ C of states (or the first exit time from n set C) (recall that θ is the random variable to denote the n-th state of a path). Let n F(C,S) = {f : C 7→ S} denote the set of all functions from C to S. For every f ∈F(C,S)wedefineadirectedgraphG =(S,E )where(s,t)∈E ifff(s)=t. f f f Letα =1ifthedirectedgraphG hasnodirectedcycles(i.e.,G isadirectedacyclic f f f graph);andα =0otherwise.Observethatsincef isafunction,foreverys∈C there f isexactlyonepaththatstartsats.Foreverys ∈C andeveryt∈S,letβ (s,t)=1if f thedirectedpaththatleavessinG reachest,otherwiseβ (s,t) = 0.Wenowstatea f f resultthatcanbeobtainedasaspecialcaseoftheresultfromFriedlinandWentzell[17]. Belowweusetheformulationoftheresultaspresentedin[25](Lemma2of[25]). Theorem3 (Friedlin-Wentzellresult[17]).Let(S,δ)beaMarkovchain,andletC ⊂ S be a proper subsetof S such that Pr (ex < ∞) > 0 for every s ∈ C (i.e., from s C alls∈C withpositiveprobabilitythefirsthittingtimetothecomplementsetisfinite). Thenforeveryinitialstates ∈C andforeveryt6∈C wehave 1 (β (s ,t)· δ(s)(f(s))) f∈F(C,S) f 1 s∈C Prs1(θexC =t)= (α · δ(s)(f(s))) , (1) P f∈F(C,S) f sQ∈C P Q in other words, the probability that the exit state is t when the starting state is s is 1 given by the expression on the right handside (very informally the right handside is thenormalizedpolynomialexpressionforexitprobabilities). Value function difference for Markov chains. We will use the result of Theorem 3 toobtainboundsonthevaluefunctionsofMarkovchains.Westartwiththenotionof mean-discountedtime. Mean-discountedtime.GivenaMarkovchain(S,δ)andadiscountvectorλ,wede- fine for everystate s ∈ S, the mean-discountedtime the processis in the state s. We first definethe mean-discountedtime functionMDT(λ,s) : Ω 7→ R thatmapsevery pathtothemean-discountedtimethatthestatesisvisited,andthefunctionisformally definedasfollows:forapathω =s s s ...wehave 0 1 2 ∞ ( j λ(s ))·1 MDT(λ,s)(ω)= j=0 i=0 i sj=s; P ∞jQ=0( ji=0λ(si)) P Q where1 istheindicatorfunction.Theexpectedmean-discountedtimefunctionfor sj=s aMarkovchainGwithtransitionfunctionδ isdefinedasfollows:MT(s ,s,G,λ) = 1 E [MDT(λ,s)], i.e., itis the expectedmean-discountedtime fors when the starting s1 stateiss ,wheretheexpectationmeasureisdefinedbytheMarkovchainwithtransition 1 functionδ.Wenowpresentalemmathatshowsthevaluefunctionformulti-discounted Markovchainscanbeexpressedasratiooftwopolynomials(theresultisobtainedasa simpleextensionofaresultofSolan[25]). Lemma1. ForMarkovchainsdefinedonstatespaceS,forallinitialstatess ,forall 0 statess,foralldiscountvectorsλ,thereexiststwopolynomialsg (·)andg (·)in|S|2 1 2 variablesx ,wheret,t′ ∈S suchthatthefollowingconditionshold: t,t′ 1. thepolynomialshavedegreeatmost|S|withnon-negativecoefficients;and 2. foralltransitionfunctionsδ overS wehaveMT(s ,s,G,λ) = g1(δ),whereG= 0 g2(δ) (S,δ),g (δ)andg (δ)denotethevaluesofthefunctiong andg suchthatallthe 1 2 1 2 variablesx isinstantiatedwithvaluesδ(t)(t′)asgivenbythetransitionfunction t,t′ δ. Proof. (Sketch).Wepresentasketchoftheproof(detailsinappendix).Fixadiscount vectorλ.WeconstructaMarkovchainG = (S,δ)asfollows:S = S∪S ,whereS 1 1 isacopyofstatesofS (andforastates∈ S wedenoteitscorrespondingcopyass ); 1 andthetransitionfunctionδisdefinedbelow 1. δ(s )(s )=1foralls ∈S (i.e.,allcopystatesareabsorbing); 1 1 1 1 2. fors∈S wehave (1−λ(s)) t=s ; 1 δ(s)(t)= λ(s)·δ(s)(t) t∈S;  0 t∈S1\s1; i.e.,itgoestothecopywithprobability(1−λ(s)),itfollowsthetransitionδinthe originalcopywithprobabilitiesmultipliedbyλ(s). We first show that for all s0 and s we have MT(s0,s,G,λ) = Prδs0(θexS = s1); i.e., the expected mean-discounted time in s when the original Markov chain starts ins istheprobabilityintheMarkovchain(S,δ)thatthefirsthittingstateoutofS is 0 the copy s of the state s. The claim is easy to verify as both (MT(s ,s,G,λ)) 1 0 s0∈S aenqdua(tPiornδss0:(fθoerxSall=t∈s1S))sw0∈eShaavreeythe=u(n1iq−ueλs(otl)u)t·io1n of+the followλi(ntg)·sδy(stt)e(mz)o·fylin.ear t t=s z∈S z WenowclaimthatPrδ (ex < ∞) > 0foralls ∈ S.Thisfollowssinceforall s0 S 0 P s∈S wehaveδ(s)(s )=(1−λ(s))>0andsinces 6∈SwehavePrδ (ex =2)= 1 1 s0 S (1−λ(s )) > 0.NowweobservethatwecanapplyTheorem3ontheMarkovchain 0 G = (S,δ) with S as the set C of states of Theorem3, andobtain the result. Indeed the termsα and β (s,t) are independentofδ, and the two productsof Equation(1) f f eachcontainsatmost|S|termsoftheformδ(s)(t)fors,t∈S.Thusthedesiredresult follows.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.