ebook img

The Complexity of Partial-observation Stochastic Parity Games With Finite-memory Strategies PDF

0.16 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview The Complexity of Partial-observation Stochastic Parity Games With Finite-memory Strategies

The Complexity of Partial-observation Stochastic Parity Games With Finite-memory Strategies∗ KrishnenduChatterjee† LaurentDoyen§ SumitNain‡ MosheY. Vardi‡ † ISTAustria 4 § CNRS,LSV,ENSCachan 1 0 ‡ RiceUniversity, USA 2 n a J 4 Abstract 1 We consider two-player partial-observation stochastic games on finite-state graphs where player 1 has ] O partial observation and player 2 has perfect observation. The winning condition we study are ω-regular conditions specified as parity objectives. The qualitative-analysis problem given a partial-observation L stochastic game and a parity objective asks whether there is a strategy to ensure that the objective is . s satisfied with probability 1 (resp. positive probability). These qualitative-analysis problems are known c [ to be undecidable. However in many applications the relevant question is the existence of finite-memory strategies, and the qualitative-analysis problems under finite-memory strategies was recently shown to be 1 decidable in 2EXPTIME. We improve the complexity and show that the qualitative-analysis problems for v 9 partial-observationstochasticparitygamesunderfinite-memorystrategiesareEXPTIME-complete;andalso 8 establishoptimal(exponential)memoryboundsforfinite-memorystrategiesrequiredforqualitativeanalysis. 2 3 1 Introduction . 1 Gamesongraphs. Two-playerstochasticgamesonfinitegraphsplayedforinfiniteroundsiscentralinmanyareas 0 4 ofcomputer science as they provide anatural setting tomodel nondeterminism and reactivity inthe presence of 1 randomness. In particular, infinite-duration games with omega-regular objectives are a fundamental tool in the : v analysisofmanyaspectsofreactivesystemssuchasmodeling,verification,refinement,andsynthesis[1,17]. For i X example, thestandard approach tothesynthesis problem forreactive systemsreduces theproblem tofindingthe r winning strategy of a suitable game [23]. The most common approach to games assumes a setting with perfect a information, where both players have complete knowledge of the state of the game. In many settings, however, the assumption of perfect information is not valid and it is natural to allow an information asymmetry between theplayers, suchas,controllers withnoisysensorsandsoftwaremodulesthatexposepartialinterfaces [24]. Partial-observation stochastic games. Partial-observation stochastic games are played between two players (player1andplayer2)onagraphwithfinitestatespace. Thegameisplayedforinfinitelymanyroundswherein eachround either player 1chooses amoveorplayer 2chooses amove,andthe successor state isdetermined by aprobabilistic transition function. Player1haspartialobservation wherethestatespaceispartitioned according to observations that she can observe i.e., given the current state, the player can only view the observation of the state (thepartition the statebelongs to), butnot theprecise state. Player 2,the adversary toplayer 1, hasperfect observation andcanobservetheprecisestate. ∗ThisresearchwassupportedbyAustrianScienceFund(FWF)GrantNoP23499-N23,FWFNFNGrantNoS11407-N23(RiSE),ERCStartgrant (279307:GraphGames),MicrosoftFacultyFellowshipAward,NSFgrantsCNS1049862andCCF-1139011,byNSFExpeditionsinComputingproject ”ExCAPE:ExpeditionsinComputerAugmentedProgramEngineering”,byBSFgrant9800096,andbygiftfromIntel. The class of ω-regular objectives. An objective specifies the desired set of behaviors (or paths) for player 1. In verification and control of stochastic systems an objective is typically an ω-regular set of paths. The class ofω-regular languages extends classical regularlanguages toinfinite strings, andprovides arobust specification languagetoexpressallcommonlyusedspecifications[25]. Inaparityobjective,everystateofthegameismapped to a non-negative integer priority and the goal is to ensure that the minimum priority visited infinitely often is even. Parity objectives are a canonical way to define such ω-regular specifications. Thus partial-observation stochastic gameswithparityobjectiveprovideageneral frameworkforanalysisofstochastic reactivesystems. Qualitative and quantitative analysis. Given a partial-observation stochastic game with a parity objective and a startstate,thequalitative-analysis problemaskswhethertheobjectivecanbeensuredwithprobability1(almost- surewinning)orpositiveprobability (positivewinning);whereasthemoregeneralquantitative-analysis problem askswhethertheobjective canbesatisfiedwithprobability atleastλforagiventhreshold λ ∈ (0,1). Previousresults. Thequantitativeanalysisproblemforpartial-observationstochasticgameswithparityobjectives is undecidable, even for the very special case of probabilistic automata with reachability objectives [22]. The qualitative-analysis problems for partial-observation stochastic games with parity objectives are also undecidable [2], even for probabilistic automata. In many practical applications, however, the more relevant question is the existence of finite-memory strategies. The quantitative analysis problem remains undecidable for finite-memory strategies, even for probabilistic automata [22]. Thequalitative-analysis problems for partial- observation stochastic parity gameswereshowntobedecidable with2EXPTIMEcomplexity forfinite-memory strategies [21];andtheexactcomplexityoftheproblemswasopenwhichwesettleinthiswork. Ourcontributions. Ourcontributions areasfollows: forthequalitative-analysis problemsforpartial-observation stochastic parity games under finite-memory strategies we show that (i) the problems are EXPTIME-complete; and(ii)ifthereisafinite-memoryalmost-sure (resp. positive)winningstrategy, thenthereisastrategy thatuses at most exponential memory (matching the exponential lower bound known for the simpler case of reachability andsafetyobjectives). Thusweestablishbothoptimalcomputational andstrategycomplexityresults. Moreover, once afinite-memory strategy isfixedforplayer 1,weobtain afinite-state perfect-information Markov decision process (MDP)forplayer 2wherefinite-memory isaspowerful asinfinite-memory [13]. Thusourresults apply tobothcaseswhereplayer2hasinfinite-memory orrestricted tofinite-memorystrategies. Technical contribution. The 2EXPTIME upper bound of [21] is achieved via a reduction to the emptiness problem of alternating parity tree automata. The reduction of [21] to alternating tree automata is exponential as it requires enumeration of the end components and recurrent classes that can arise after fixing strategies. We presentapolynomialreduction, whichisachievedintwosteps. Thefirststepisasfollows: alocalgadget-based reduction(thattransformseveryprobabilisticstatetoalocalgadgetofdeterministicstates)forperfect-observation stochastic games to perfect-observation deterministic games for parity objectives was presented in [12, 6]. This gadget, however, requires perfect observation for both players. We extend this reduction and present a local gadget-based polynomial reduction of partial-observation stochastic games to three-player partial-observation deterministic games, whereplayer 1has partial observation, the other twoplayers have perfect observation, and player3ishelpful toplayer1. Thecruxoftheproofistoshowthatthelocalreduction allowstoinferproperties about recurrent classes and end components (which are global properties). In the second step we present a polynomial reduction of the three-player games problem to the emptiness problem of alternating tree automata. Wealsoremarkthatthenewmodelofthree-player gamesweintroducefortheintermediatestepofthereduction maybealsoofindependent interestformodelingofotherapplications. Related works. The undecidability of the qualitative-analysis problem for partial-observation stochastic parity games with infinite-memory strategies follows from [2]. For partially observable Markov decision processes (POMDPs), which is a special case of partial-observation stochastic games where player 2 does not have any choices, the qualitative-analysis problem for parity objectives with finite-memory strategies was shown to be EXPTIME-complete[7]. Forpartial-observation stochastic games the almost-sure winning problem wasshown to be EXPTIME-completefor Bu¨chi objectives (both for finite-memory and infinite-memory strategies) [11, 8]. Finally, for partial-observation stochastic parity games the almost-sure winning problem under finite-memory strategies wasshowntobedecidable in2EXPTIMEin[21]. Summary and discussion. The results for the qualitative analysis of various models of partial-observation stochastic parity games with finite-memory strategies for player 1 is summarized in Table 1. We explain the results of the table. The results of the first row follows from [7] and the results for the second row are the results of our contributions. In the most general case both players have partial observation [3]. If we consider partial-observation stochastic games where both players have partial observation, then the results of the table are derived as follows: (a) If we consider infinite-memory strategies for player 2, then the problem remains undecidable as when player 1 is non-existent we obtain POMDPsas a special case. The non-elementary lower bound followsfromtheresults of[8]wherethelowerboundwasshownforreachability objectives wherefinite- memorystrategies sufficeforplayer 1(against both finiteandinfinite-memory strategies forplayer 2). (b)Ifwe consider finite-memory strategies for player 2, then the decidability of the problem is open, but we obtain the non-elementary lowerboundonmemoryfromtheresultsof[8]forreachability objectives. GameModels Complexity Memorybounds POMDPs EXPTIME-complete[7] Exponential [7] Player1partialandplayer2perfect EXPTIME-complete Exponential (finite-orinfinite-memoryforplayer2) Bothplayers partial Undecidable [2] Non-elementary [8] infinite-memory forplayer2 (Lowerbound) Bothplayers partial Open(??) Non-elementary [8] finite-memoryforplayer2 (Lowerbound) Table 1: Complexity and memory bounds for qualitative analysis of partial-observation stochastic parity games withfinite-memorystrategies forplayer1. Thenewresultsareboldfaced. 2 Partial-observation StochasticParityGames Weconsider partial-observation stochastic parity games where player 1 has partial observation and player 2has perfect observation. We consider parity objectives, and for almost-sure winning under finite-memory strategies for player 1 present a polynomial reduction to sure winning in three-player parity games where player 1 has partial observation, player 3 has perfect observation and is helpful towards player 1, and player 2 has perfect observation and is adversarial to player 1. A similar reduction also works for positive winning. We then show how to solve the sure-winning problem for three-player games using alternating parity tree automata. Thus the stepsareasfollows: 1. Reduction of partial-observation stochastic parity games for almost-sure winning with finite-memory strategies to three-player parity games sure-winning problem (with player 1 partial, other two perfect, player1andplayer3existential, andplayer2adversarial). 2. Solvingthesurewinningproblem forthree-player paritygamesusingalternating paritytreeautomata. Inthissection wepresent thedetails ofthefirststep. Thesecond stepisgiveninthefollowingsection. 2.1 Basicdefinitions Westartwithbasicdefinitionsrelatedtopartial-observation stochastic paritygames. Partial-observation stochasticgames. Weconsider slightlydifferentnotation(thoughequivalent)totheclassical definitions, buttheslightly different notation helps formoreelegant andexplicit reduction. Weconsider partial- 3 observation stochastic games as a tuple G = (S ,S ,S ,A ,δ,E,O,obs) as follows: S = S ∪ S ∪ S 1 2 P 1 1 2 P is the state space partitioned into player-1 states (S ), player-2 states (S ), and probabilistic states (S ); and 1 2 P A is a finite set of actions for player 1. Since player 2 has perfect observation, she chooses edges instead 1 of actions. The transition function is as follows: δ : S × A → S that given a player-1 state in S 1 1 2 1 and an action in A gives the next state in S (which belongs to player 2); and δ : S → D(S ) given a 1 2 P 1 probabilistic stategivestheprobability distribution overthesetofplayer-1 states. Thesetofedgesisasfollows: E = {(s,t) | s ∈ S ,t ∈ S ,δ(s)(t) > 0}∪E′,whereE′ ⊆ S ×S . Theobservation setO and observation P 1 2 P mapping obs are standard, i.e., obs : S → O. Note that player 1 plays after every three steps (every move of player1isfollowedbyamoveofplayer2,thenaprobabilistic choice). Inotherwords,firstplayer1chooses an action,thenplayer2choosesanedge,andthenthereisaprobabilitydistributionoverstateswhereplayer1again choosesandsoon. Three-player non-stochastic turn-based games. We consider three-player partial-observation (non-stochastic turn-based) games as a tuple G = (S ,S ,S ,A ,δ,E,O,obs) as follows: S is the state space partitioned into 1 2 3 1 player-1 states (S ), player-2 states (S ), and player-3 states (S ); and A is a finite set of actions for player 1. 1 2 3 1 Thetransitionfunctionisasfollows: δ : S ×A → S thatgivenaplayer-1stateinS andanactioninA gives 1 1 2 1 1 the next state (which belongs to player 2). The set of edges is as follows: E ⊆ (S ∪S )×S. Hence in these 2 3 games player 1 chooses an action, and the other players have perfect observation and choose edges. We only considerthesub-classwhereplayer1playsineveryk-steps,forafixedk. TheobservationsetOandobservation mappingobsareagainstandard. Plays and strategies. A play in a partial-observation stochastic game is an infinite sequence of states s s s ... 0 1 2 such that the following conditions hold for all i ≥ 0: (i) if s ∈ S , then there exists a ∈ A such that i 1 i 1 s = δ(s ,a ); and (ii) if s ∈ (S ∪ S ), then (s ,s ) ∈ E. The function obs is extended to sequences i+1 i i i 2 P i i+1 ρ = s ...s of states in the natural way, namely obs(ρ) = obs(s )...obs(s ). A strategy for a player is a 0 n 0 n recipe toextend theprefix of aplay. Formally, player-1 strategies are functions σ : S∗ ·S → A ;and player-2 1 1 (andanalogously player-3strategies)arefunctions: π : S∗·S → S suchthatforallw ∈ S∗ands ∈ S wehave 2 2 (s,π(w ·s)) ∈ E. We consider only observation-based strategies for player 1, i.e., for two play prefixes ρ and ρ′ ifthecorresponding observation sequences match(obs(ρ) = obs(ρ′)),thenthestrategymustchoosethesame action(σ(ρ) = σ(ρ′));andtheotherplayershaveallstrategies. Thenotations forthree-player gamesaresimilar. Finite-memory strategies. A player-1 strategy uses finite-memory if it can be encoded by a deterministic transducer hM,m ,σ ,σ i where M is a finite set (the memory of the strategy), m ∈ M is the initial memory 0 u n 0 value, σ : M× O → M is the memory-update function, and σ : M → A is the next-move function. The u n 1 sizeofthestrategy isthenumber|M|ofmemoryvalues. Ifthecurrent observation iso,andthecurrent memory value is m,then the strategy chooses the next action σ (m), and thememory is updated to σ (m,o). Formally, n u hM,m ,σ ,σ i defines the strategy σ such that σ(ρ · s) = σ (σ (m ,obs(ρ) · obs(s)) for all ρ ∈ S∗ and 0 u n n u 0 b s ∈ S , where σ extends σ to sequences of observations as expected. This definition extends to infinite- 1 u u b memorystrategiesbydropping theassumption thatthesetMisfinite. Parity objectives. An objective for Player 1 in G is a set ϕ ⊆ Sω of infinite sequences of states. A play ρ satisfies the objective ϕ if ρ ∈ ϕ. For a play ρ = s s ... we denote by Inf(ρ) the set of states 0 1 that occur infinitely often in ρ, that is, Inf(ρ) = {s | s = sforinfinitely manyj’s}. For d ∈ N, let j p : S → {0,1,...,d} be a priority function, which maps each state to a nonnegative integer priority. The parity objective Parity(p) requires that the minimum priority that occurs infinitely often be even. Formally, Parity(p) = {ρ | min{p(s) | s ∈ Inf(ρ)}iseven}. Parity objectives are a canonical way to express ω-regular objectives [25]. Almost-sure winning and positive winning. An event is a measurable set of plays. For a partial-observation stochasticgame,givenstrategiesσandπforthetwoplayers,theprobabilitiesofeventsareuniquelydefined[26]. ForaparityobjectiveParity(p),wedenotebyPσ,π(Parity(p))theprobabilitythatParity(p)issatisfiedbytheplay s obtained fromthestarting stateswhenthestrategies σ andπ areused. Thealmost-sure (resp. positive) winning problem under finite-memory strategies asks, given a partial-observation stochastic game, a parity objective Parity(p), and a starting state s, whether there exists a finite-memory observation-based strategy σ for player 1 such that against all strategies π for player 2 we have Pσ,π(Parity(p)) = 1 (resp. Pσ,π(Parity(p)) > 0). The s s almost-sureandpositivewinningproblemsarealsoreferredtoasthequalitative-analysis problemsforstochastic games. Surewinning inthree-player games. Inthree-player gamesonce thestarting state sandstrategies σ,π, andτ of σ,π,τ thethreeplayersarefixedweobtainauniqueplay,whichwedenoteasρ . Inthree-playergamesweconsider s the following sure winning problem: given a parity objective Parity(p), sure winning is ensured if there exists afinite-memory observation-based strategy σ forplayer 1, such that inthe two-player perfect-observation game obtained after fixing σ, player 3 can ensure the parity objective against all strategies of player 2. Formally, the sure winning problem asks whether there exist a finite-memory observation-based strategy σ for player 1 and a σ,π,τ strategyτ forplayer3,suchthatforallstrategies π forplayer2wehaveρ ∈ Parity(p). s REMARK 1. (EQUIVALENCE WITH STANDARD MODEL) We remark that for the model of partial-observation stochastic games studied in literature the two players simultaneously choose actions, and a probabilistic transition function determine the probability distribution ofthe next state. Inour model, the gameis turn-based and theprobability distribution ischosen only in probabilistic states. However, itfollows from theresults of[9] that the models are equivalent: by the results of [9, Section 3.1] the interaction of the players and probability can be separated without loss of generality; and [9, Theorem 4] shows that in presence of partial observation, concurrent gamescanbereducedtoturn-based gamesinpolynomialtime. Thustheturn-based modelwherethe movesoftheplayers andstochastic interaction areseparated isequivalent tothestandard model. Moreover, for a perfect-information player choosing an action is equivalent to choosing an edge in a turn-based game. Thus themodelweconsider isequivalent tothestandard partial-observation gamemodels. REMARK 2. (PURE AND RANDOMIZED STRATEGIES) Inthisworkweonlyconsiderpurestrategies. Inpartial- observation games, randomized strategies are also relevant as they are more powerful than pure strategies. However, for finite-memory strategies the almost-sure and positive winning problem for randomized strategies can be reduced in polynomial time to the problem for finite-memory pure strategies [8, 21]. Hence without loss ofgenerality weonlyconsider purestrategies. 2.2 Reduction of partial-observation stochastic games to three-player games In this section we present a polynomial-time reductionforthealmost-surewinningprobleminpartial-observation stochastic paritygamesto thesurewinningprobleminthree-player paritygames. Reduction. Let us denote by [d] the set {0,1,...,d}. Given a partial-observation stochastic parity game graph G = (S ,S ,S ,A ,δ,E,O,obs)withaparity objective defined bypriority function p : S → [d]weconstruct 1 2 P 1 athree-playergamegraphG = (S ,S ,S ,A ,δ,E,O,obs)togetherwithpriorityfunctionp. Theconstruction 1 2 3 1 isspecifiedasfollows. 1. Foreverynonprobabilistic states ∈S ∪S ,thereisacorresponding states ∈ S suchthat 1 2 • s ∈ S ifs ∈ S ,elses ∈ S ; 1 1 2 • p(s)= p(s)andobs(s) = obs(s); • δ(s,a) = twheret = δ(s,a),fors ∈ S anda ∈ A ;and 1 1 • (s,t) ∈ E iff(s,t) ∈ E,fors ∈ S . 2 2. Every probabilistic state s ∈ S is replaced by the gadget shown in Figure 1 and Figure 2. In the figure, P square-shaped statesareplayer-2 states(inS ),andcircle-shaped (orellipsoid-shaped) statesareplayer-3 2 5 states (in S ). Formally, from the state s with priority p(s) and observation obs(s) (i.e., p(s) = p(s)and 3 obs(s) = obs(s))theplayers playthefollowingthree-step gameinG. • First,instatesplayer2choosesasuccessor (s,2k),for2k ∈ {0,1,...,p(s)+1}. e • For every state (s,2k), we have p((s,2k)) = p(s) and obs((s,2k)) = obs(s). For k ≥ 1, e e e in state (s,2k) player 3 chooses between two successors: state (s,2k − 1) with priority 2k − 1 e b and same observation as s, or state (s,2k) with priority 2k and same observation as s, (i.e., b p((s,2k − 1)) = 2k − 1, p((s,2k)) = 2k, and obs((s,2k − 1)) = obs((s,2k)) = obs(s)). The b b b b state(s,0)hasonlyonesuccessor (s,0),withp((s,0)) = 0andobs((s,0)) = obs(s). e b b b • Finally, in each state (s,k) the choice is between all states t such that (s,t) ∈ E, and it belongs to b player 3 (i.e., in S )ifk isodd, and to player 2(i.e., in S ) ifk is even. Notethat every state inthe 3 2 gadgethasthesameobservation astheoriginalstate. WedenotebyG = Tr (G)thethree-player game,whereplayer1haspartial-observation, andbothplayer2and as player 3 have perfect-observation, obtained from a partial-observation stochastic game. Also observe that in G thereareexactlyfourstepsbetweentwoplayer1moves. Observation sequence mapping. Note that since in our partial-observation games first player 1 plays, then player 2,followed byprobabilistic states, repeated adinfinitum, wlog, wecanassume thatforeveryobservation o ∈ O we have either (i) obs−1(o) ⊆ S ; or (ii) obs−1(o) ⊆ S ; or (i) obs−1(o) ⊆ S . Thus we partition the 1 2 P observationsasO ,O ,andO . Givenanobservationsequenceκ= o o o ...o inGcorrespondingtoafinite 1 2 P 0 1 2 n prefixof aplay, weinductively definethe sequence κ = h(κ) inGas follows: (i)h(o ) = o ifo ∈ O ∪O , 0 0 0 1 2 elseo o o ;(ii)h(o o ...o ) = h(o o ...o )o ifo ∈ O ∪O ,elseh(o o ...o )o o o . Intuitively 0 0 0 0 1 n 0 1 n−1 n n 1 2 0 1 n−1 n n n the mapping takes care of the two extra steps of the gadgets introduced for probabilistic states. The mapping is abijection, andhence given anobservation sequence κofaplay prefix inGweconsider theinverse play prefix −1 κ= h (κ)suchthath(κ) = κ. Strategy mapping. Given an observation-based strategy σ in G we consider a strategy σ = Tr (σ) as follows: as for an observation sequence κ corresponding to a play prefix in G we have σ(κ) = σ(h(κ)). The strategy σ is observation-based (since σ is observation-based). The inverse mapping Tr −1 of strategies from G to G as is analogous. Note that for σ in G we have Tr (Tr −1(σ)) = σ. Let σ be a finite-memory strategy with as as memory M for player 1 in the game G. The strategy σ can be considered as a memoryless strategy, denoted as σ∗ = MemLess(σ),inG×M(thesynchronous productofGwithM). Givenastrategy(purememoryless)π for player 2inthe2-player gameG×M,astrategy π = Tr (π)inthepartial-observation stochastic gameG×M as isdefinedasfollows: π((s,m)) = (t,m′), ifandonlyifπ((s,m)) = (t,m′); foralls ∈ S . 2 Endcomponent and the key property. Givenan MDP,aset U isan end component inthe MDPifthe sub-graph induced byU isstrongly connected, andforallprobabilistic states inU allout-going edges end upinU (i.e., U isclosed forprobabilistic states). Thekey property about MDPsthatisused inourproofs isaresult established by [13, 14]that given an MDP,for all strategies, with probability 1 the set of states visited infinitely often is an endcomponent. Thekeyproperty allowsustoanalyzeendcomponents ofMDPsandfromproperties oftheend component concludeproperties aboutallstrategies. The key lemma. We are now ready to present our main lemma that establishes the correctness of the reduction. Sincetheproofofthelemmaislongwesplittheproofintotwoparts. LEMMA 2.1. Given a partial-observation stochastic parity game G with parity objective Parity(p), let G = Tr (G) be the three-player game with the modified parity objective Parity(p) obtained by our reduction. as s p(s) ... (s,0) p(s) (s,2) p(s) (s,4) p(s) ... (s,p(s)) p(s) e e e e (s,0) 0 (s,1) 1 (s,2) 2 (s,3) 3 (s,4) 4 ... (s,p(s)−1) (s,p(s)) b b b b b b b p(s)−1 p(s) · · · · · · · · · · · · · · E(s) E(s) E(s) E(s) E(s) E(s) E(s) Figure1: Reduction gadgetwhenp(s)iseven. s p(s) ... (s,0) p(s) (s,2) p(s) (s,4) p(s) ... (s,p(s)+1) p(s) e e e e (s,0) 0 (s,1) 1 (s,2) 2 (s,3) 3 (s,4) 4 ... (s,p(s)) p(s) b b b b b b · · · · · · · · · · · · E(s) E(s) E(s) E(s) E(s) E(s) Figure2: Reduction gadgetwhenp(s)isodd. 7 Consider a finite-memory strategy σ with memory M for player 1 in G. Let us denote by G the perfect- σ observationtwo-playergameplayedoverG×Mbyplayer2andplayer3afterfixingthestrategyσforplayer1. Let σ U = {(s,m) ∈ S ×M |player3hasasurewinningstrategy fortheobjective Parity(p)from(s,m)inG }; 1 σ σ σ andletU = (S×M)\U bethesetofsurewinningstatesforplayer2inG . Considerthestrategyσ = Tr (σ), 2 1 σ as andthesetsUσ = {(s,m) ∈ S ×M |(s,m) ∈ Uσ};andUσ =(S ×M)\Uσ. Thefollowingassertions hold. 1 1 2 1 1. Forall(s,m) ∈ Uσ,forallstrategies π ofplayer2wehavePσ,π (Parity(p)) = 1. 1 (s,m) 2. Forall(s,m) ∈ Uσ,thereexistsastrategyπ ofplayer2suchthatPσ,π (Parity(p)) < 1. 2 (s,m) Wefirstpresenttheproofforpart1andthenforpart2. Proof. [(ofLemma2.1: part1).] Considerafinite-memorystrategyσforplayer1withmemoryMinthegameG. Oncethestrategyσ isfixedweobtainthetwo-playerfinite-stateperfect-observation gameG (betweenplayer3 σ andtheadversaryplayer2). Recallthesurewinningsets σ U = {(s,m) ∈ S ×M |player3hasasurewinningstrategy fortheobjectiveParity(p)from(s,m)inG } 1 σ σ σ for player 3, and U = (S ×M)\U for player 2, respectively, in G . Letσ = Tr (σ) be the corresponding 2 1 σ as strategy in G. We denote by σ∗ = MemLess(σ) and σ∗ the corresponding memoryless strategies of σ in G × M and σ in G × M, respectively. We show that all states in Uσ are almost-sure winning, i.e., 1 given σ, for all (s,m) ∈ Uσ, for all strategies π for player 2 in G we have Pσ,π (Parity(p)) = 1 (recall 1 (s,m) U1σ = {(s,m) ∈ S ×M | (s,m) ∈ Uσ1}). We also consider explicitly the MDP (G×M ↾ U1σ)σ∗ to analyze strategies of player 2 on the synchronous product, i.e., we consider the player-2 MDP obtained after fixing the memorylessstrategyσ∗ inG×M,andthenrestricttheMDPtothesetUσ. 1 Two key components. The proof has two key components. First, we argue that all end components in the MDP restricted toUσ are winning forplayer 1(have minpriority even). Second weargue that given the starting state 1 (s,m) is in Uσ, almost-surely the set of states visited infinitely often is an end component in Uσ against all 1 1 strategies ofplayer2. Thesetwokeycomponents establish thedesiredresult. Winning end components. Our first goal is to show that every end component C in the player-2 MDP (G×M ↾ U1σ)σ∗ iswinningforplayer1fortheparityobjective,i.e.,theminimumpriorityofC iseven. Weargue thatifthereisanendcomponent C in(G×M ↾ U1σ)σ∗ thatiswinningforplayer 2fortheparity objective(i.e., minimumpriority ofC isodd), thenagainst anymemoryless player-3 strategy τ inG ,player 2canconstruct a σ σ cycleinthegame(G×M ↾ U1)σ∗ thatiswinningforplayer 2(i.e.,minimumpriorityofthecycleisodd)(note thatgiventhestrategyσ isfixed,wehavefinite-stateperfect-observation paritygames,andhenceintheenlarged gamewecanrestrictourselvestomemorylessstrategiesforplayer3). Thisgivesacontradictionbecauseplayer3 σ has asure winning strategy from the set U inthe 2-player parity gameG . Towardscontradiction, letC be an 1 σ endcomponent in(G×M ↾ U1σ)σ∗ thatiswinningforplayer2,andletitsminimumoddpriority be2r−1,for some r ∈ N. Then there is a memoryless strategy π′ for player 2 in the MDP (G×M ↾ U1σ)σ∗ such that C is a bottom scc (or a terminal scc) in the Markov chain graph of (G×M ↾ U1σ)σ∗,π′. Let τ be a memoryless for player3in(G×M ↾ Uσ1)σ∗. Givenτ forplayer3andstrategyπ′forplayer2inG×M,weconstructastrategyπ σ forplayer2inthegame(G×M↾ U1)σ∗ asfollows. Foraplayer-2stateinC,thestrategyπfollowsthestrategy π′, i.e., for a state (s,m) ∈ C with s ∈ S we have π((s,m)) = (t,m′) where (t,m′) = π′((s,m)). For a 2 probabilistic stateinC wedefinethestrategy asfollows(i.e.,wenowconsiderastate(s,m)∈ C withs ∈ S ): P • if for some successor state ((s,2ℓ),m′) of (s,m), the player-3 strategy τ chooses a successor ((s,2ℓ − e b 1),m′′) ∈ C at the state ((s,2ℓ),m′), for ℓ < r, then the strategy π chooses at state (s,m) the successor e ((s,2ℓ),m′);and e • otherwise thestrategy π chooses atstate(s,m)thesuccessor ((s,2r),m′),andat((s,2r),m′′)itchooses e b asuccessor shortening thedistance (i.e.,chooses asuccessor withsmallerbreadth-first-search distance) to afixedstate(s∗,m∗)ofpriority2r−1ofC (suchastate(s∗,m∗)existsinC sinceC isstronglyconnected andhasminimumpriority2r−1);andforthefixedstateofpriority2r−1thestrategychoosesasuccessor (s,m′)suchthat(s,m′)∈ C. Consideranarbitrarycycleinthesubgraph(G×M ↾ C) whereC isthesetofstatesinthegadgetsofstates σ,π,τ inC. Therearetwocases. • If there is at least one state ((s,2ℓ −1),m), with ℓ ≤ r on the cycle, then the minimum priority on the b cycle is odd, as even priorities smaller than 2r are not visited by the construction as C does not contain statesofevenpriorities smallerthan2r. • Otherwise,inallstateschoicesshorteningthedistancetothestatewithpriority2r−1aretakenandhence thecyclemustcontainapriority2r−1stateandallotherpriorities onthecycleare≥ 2r−1,so2r−1is theminimumpriority onthecycle. Henceawinningendcomponentforplayer2intheMDPcontradicts thatplayer3hasasurewinningstrategyin Gσ fromUσ1. Thusitfollowsthatallendcomponents arewinningforplayer1in(G×M↾ U1σ)σ∗. Almost-sure reachability towinning end-components. Finally, weconsider theprobability ofstaying inUσ. For 1 everyprobabilistic state(s,m)∈ (S ×M)∩Uσ,allofitssuccessorsmustbeinUσ. Otherwise,player2inthe P 1 1 σ state(s,m)ofthegameG canchoosethesuccessor(s,0)andthenasuccessortoitswinningsetU . Thisagain σ 2 e σ contradictstheassumptionthat(s,m)belongtothesurewinningstatesU forplayer3inG . Similarly,forevery 1 σ state(s,m)∈ (S ×M)∩Uσ wemusthaveallitssuccessors areinUσ. Forallstates(s,m) ∈(S ×M)∩Uσ, 2 1 1 1 1 the strategy σ chooses a successor in Uσ. Hence for all strategies π of player 2, for all states (s,m) ∈ Uσ, the 1 1 objective Safe(Uσ) (which requires that only states in Uσ are visited) is ensured almost-surely (in fact surely), 1 1 andhencewithprobability 1thesetofstatesvisited infinitelyoftenisanendcomponent inUσ (bykeyproperty 1 ofMDPs). Sinceeveryendcomponentin(G×M ↾ U1σ)σ∗ hasevenminimumpriority,itfollowsthatthestrategy σ is an almost-sure winning strategy for the parity objective Parity(p) for player 1 from all states (s,m) ∈ Uσ. 1 Thisconcludes theproofforfirstpartofthelemma. Wenowpresenttheproofforthesecondpart. Proof. [(ofLemma2.1:part 2).] Consider amemoryless surewinning strategy π forplayer 2inG from theset σ σ U . Let us consider the strategies σ = Tr (σ) and π = Tr (π), and consider the Markov chain G . Our 2 as as σ,π proof shows the following two properties to establish the claim: (1) in the Markov chain G all bottom sccs σ,π (the recurrent classes) in Uσ have odd minimum priority; and (2) from all states in Uσ some recurrent class in 2 2 Uσ isreachedwithpositiveprobability. Thisestablishes thedesired resultofthelemma. 2 Nowinningbottom sccforplayer 1inUσ. Assumetowardscontradiction thatthereisabottom sccC contained 2 in Uσ in the Markov chain G such that the minimum priority in C is even. From C we construct a winning 2 σ,π σ cycle (minimum priority is even) in U for player 3 in the game G given the strategy π. This contradicts that 2 σ π isasure winning strategy for player 2from Uσ inG . Letthe minimum priority ofC be2r for somer ∈ N. 2 σ The idea is similar to the construction of part 1. Given C, and the strategies σ and π, we construct a strategy τ forplayer3inGasfollows: Foraprobabilistic state(s,m)inC: 9 • ifπ chooses astate((s,2ℓ−2),m′),withℓ ≤ r,thenτ choosesthesuccessor ((s,2ℓ−2),m′); e b • otherwiseℓ > r(i.e.,πchoosesastate((s,2ℓ−2),m′)forℓ > r),thenτ choosesthestate((s,2ℓ−1),m′), e b and then asuccessor toshorten the distance toafixedstate withpriority 2r (such astate exists inC);and forthefixedstateofpriority2r,thestrategy τ chooses asuccessor inC. Similar to the proof of part 1, we argue that we obtain a cycle with minimum even priority in the graph σ (G × M ↾ U ) . Consider an arbitrary cycle in the subgraph (G × M ↾ C) where C is the set of 2 σ,π,τ σ,π,τ statesinthegadgetsofstatesinC. Therearetwocases. • If there is at least one state ((s,2ℓ −2),m), with ℓ ≤ r on the cycle, then the minimum priority on the b cycleiseven,asoddprioritiesstrictlysmallerthan2r+1arenotvisitedbytheconstruction asC doesnot containstatesofoddpriorities strictlysmallerthan2r+1. • Otherwise,inallstateschoicesshortening thedistancetothestatewithpriority2r aretakenandhencethe cycle must contain apriority 2r state and all other priorities on the cycle are ≥ 2r, so 2r is the minimum priority onthecycle. Thus we obtain cycles winning for player 3, and this contradicts that π is a sure winning strategy for player 2 fromUσ. Thusitfollowsthatallrecurrent classesinUσ intheMarkovchainG arewinningforplayer2. 2 2 σ,π Not almost-sure reachability to Uσ. We now argue that given σ and π there exists no state in Uσ such that Uσ 1 2 1 is reached almost-surely. This would ensure that from all states in Uσ some recurrent class in Uσ is reached 2 2 with positive probability and establish the desired claim since we have already shown that all recurrent classes in Uσ are winning for player 2. Given σ and π, let X ⊆ Uσ be the set of states such that the set Uσ is reached 2 2 1 almost-surelyfromX,andassumetowardscontradiction thatX isnon-empty. Thisimpliesthatfromeverystate in X, in the Markov chain G , there is a path to the set Uσ, and from all states in X the successors are in σ,π 1 X. Weconstruct astrategy τ in the three-player game G against strategy π exactly as the strategy constructed σ forwinning bottom scc, withthefollowing difference: instead ofshortening distance theafixedstate ofpriority σ 2r (as for winning bottom scc’s), in this case the strategy τ shortens distance to U . Formally, given X, the 1 strategies σ andπ,weconstruct astrategyτ forplayer3inGasfollows: Foraprobabilistic state(s,m)inX: • if π chooses a state ((s,2ℓ),m′), with ℓ ≥ 1, then τ chooses the state ((s,2ℓ − 1),m′), and then a e σ b successor to shorten the distance to the set U (such a successor exists since from all states in X the set 1 σ U isreachable). 1 σ Againstthestrategyofplayer3inG either(i)U isreachedinfinitelymanysteps,or(ii)elseplayer2infinitely σ 1 often chooses successor states of the form (s,0) with priority 0 (the minimum even priority), i.e., there is a e cycle with a state (s,0) which has priority 0. If priority 0 is visited infinitely often, then the parity objective is e σ satisfied. This ensures that in G player 3 can ensure either to reach U in finitely many steps from some state σ 1 σ σ inU againstπ,ortheparityobjectiveissatisfiedwithoutreachingU . Ineithercasethisimpliesthatagainstπ 2 1 σ player 3canensure tosatisfytheparity objective (byreaching U infinitelymanysteps andthenplaying asure 1 σ σ winning strategy fromU ,orsatisfying theparity objective without reaching U byvisiting priority 0infinitely 1 1 σ σ often)fromsomestateinU ,contradicting thatπ isasurewinningstrategyforplayer2fromU . Thuswehave 2 2 acontradiction, andobtainthedesiredresult. Lemma2.1establishes the desired correctness result asfollows: (1)Ifσ isafinite-memory strategy suchthat in G player3hasasurewinningstrategy, thenbypart1ofLemma2.1weobtainthatσ = Tr (σ)isalmost-sure σ as winning. (2) Conversely, if σ is a finite-memory almost-sure winning strategy, then consider a strategy σ such

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.