ebook img

A Tauberian theorem for nonexpansive operators and applications to zero-sum stochastic games PDF

0.19 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview A Tauberian theorem for nonexpansive operators and applications to zero-sum stochastic games

A Tauberian theorem for nonexpansive operators and applications to zero-sum stochastic games Bruno Ziliotto ∗ February 24, 2015 5 1 0 2 Abstract b e We prove a Tauberian theorem for nonexpansive operators, and apply it to the model of F zero-sumstochasticgame. Undermildassumptions,weprovethatthe valueoftheλ-discounted 3 game vλ converges uniformly when λ goes to 0 if and only if the value of the n-stage game vn 2 converges uniformly when n goes to infinity. This generalizes the Tauberian theorem of Lehrer and Sorin [6] to the two-playerzero-sum case. We also provide the first example of a stochastic ] C game with public signals on the state and perfect observationof actions, with finite state space, O signal sets and action sets, in which for some initial state k1 known by both players, (vλ(k1)) . and (vn(k1)) converge to distinct limits. h t a m Introduction [ Zero-sum stochastic games were introduced by Shapley [23]. In this model, two players repeat- 2 v edly play a zero-sum game, which depends on the state of nature. At each stage, a new state of 5 nature is drawn from a distribution based on the actions of players and the state of the previous 2 stage. The state of nature is announced to both players, along with the actions of the previous 5 6 stage. There are several ways to evaluate the payoff in a stochastic game. For n N∗, the payoff in 0 the n stage game is the Cesaro mean n−1 n g , where g is the payoff at∈stage m 1. For . − m=1 m m ≥ 1 λ (0,1], the payoff in the λ discounted game is the Abel mean λ(1 λ)m−1g . Under 0 ∈ − P m≥1 − m mild assumptions, the n-stage game and the λ-discounted game have a value, denoted respectively 5 P 1 by vn and vλ (see Maitra and Parthasarathy [8] and Nowak [13]). : A huge part of the literature focuses on the existence of the limit of v when n goes to infinity, v n Xi and of the limit of vλ when λ goes to 0. Bewley and Kohlberg [1] proved that (vn) and (vλ) converge to the same limit, when the state space and action sets are finite. For Markov Decision r a Processes, this result extends to the case of compact state space, infinite action set and 1-Lipschitz transition (see Rosenberg, Solan and Vieille [20] and Renault [16]). For absorbing games, this result extends to the case of infinite state space, compact action sets and continuous payoff and transition functions (see Mertens, Neyman and Rosenberg [9]). Vigeral [26] provided an example of a stochastic game with finite state space and compact action sets in which neither (v ) nor (v ) n λ converges. A natural question is whether the convergence of (v ) implies the convergence of (v ), n λ and conversely. When (v ) is absolutely continuous with respect to λ, Neyman [24, Appendix C, λ p.177] proved that (v ) converges to the limit of (v ). In the dynamic programming framework, n λ Lehrer and Sorin [6] proved that (v ) converges uniformly (with respect to the initial state) if and n only if (v ) converges uniformly, and that when uniform convergence holds, the two limits coincide λ ∗. This result does not hold when uniform convergence is replaced by pointwise convergence (see Sorin [24, Chapter 1, p. 9-10]). In the two-player case, Li and Venel [7] proved that for recursive ∗ TSE (GREMAQ,Universit´e Toulouse 1 Capitole), 21 all´ee deBrienne, 31000 Toulouse, France. E-mail: [email protected] ∗For a proof of this result in a continuous-time model, see Oliu Barton and Vigeral [14]. Still in continuous time, in a very recent independent paper, Khlopin [4] has generalized thisresult to thetwo-player case. 1 games (which are stochastic games where the payoff is 0 in nonabsorbing states), (v ) converges n uniformly if and only if (v ) converges uniformly, and that when uniform convergence holds the two λ limits are equal. The generalization of this result to stochastic games was open. Mertens, Sorin and Zamir [10, Chapter IV] have introduced a general model of stochastic game with signals, in which players neither observe thestate nor theaction of their opponent, butinstead observe at every stage a signal correlated to the current state and the actions which have just been played (state space, action and signal sets are assumed to be finite). Ziliotto [27] provided an example of a stochastic game with public signals on the state and perfect observation of actions, such that (v ) and (v ) fail to converge (for special classes of stochastic games with signals in which n λ (v ) and (v ) converge to the same limit, see [2, 11, 15, 17, 19, 20, 21, 25]). The question of the n λ relation between the convergence of (v ) and (v ) was also open. By Mertens, Sorin and Zamir [10, n λ Chapter III], one can associate to any stochastic game with signals an auxiliary stochastic game with perfect observation of the state and actions, which has the same n-stage and λ-discounted values. The state space of this auxiliary game is infinite and compact metric, and is the set of infinite higher-order beliefs of players about the state. That is why in this paper we study first stochastic games, then apply our results to stochastic games with signals. The contribution of this paper is twofold. First, it generalizes both the result of Lehrer and Sorin [6] and Li and Venel [7] to stochastic games. We consider any stochastic game (with possibly infinite set space and action sets) in which for all n N∗ and λ (0,1], (v ) and (v ) exist and n λ ∈ ∈ satisfy Shapley equations, and prove the following Tauberian theorem: (v ) converges uniformly n if and only if (v ) converges uniformly, and when uniform convergence holds the two limits are λ equal. This theorem applies to many standard models in the literature: dynamic programming, stochastic games with finite state space and compact action sets, stochastic games with signals, hidden stochastic games, and Markov chain games. The proof of our result relies on the operator approach, introduced by Rosenberg and Sorin [22]. This approach relies on the fact that the values of the n-stage game and the λ-discounted game satisfy a functional equation, called the Shapley equation(seeShapley[23]). Thepropertiesoftheassociatednonexpansiveoperatorcanbeexploited to infer convergence properties of (v ) and (v ) (see Rosenberg and Sorin [22]). Thus, we start by n λ proving a Tauberian theorem for nonexpansive operators, and then apply it to stochastic games. Second, this paper provides the first example of a stochastic game with public signals on the state and perfect observation of the actions (hidden stochastic game), with finite state space, signal sets and action sets, in which for some initial state k known by both players, (v (k )) and (v (k )) 1 λ 1 n 1 convergetodistinctlimits(notethatintheexampleinSorin[24,Chapter1,p. 9-10], thestatespace isinfiniteandnotcompact). Anexampleofastochasticgamewithfinitestatespace,compactaction sets, perfectobservation of thestate andactions andhavingthesamepropertycanbededucedfrom this example. Thus, our example shows that as soon as the state is imperfectly observed, or the state space is not finite, or the action sets are not finite, there is no link between the convergence of (v (k )) and (v (k )), where k is some initial state. λ 1 n 1 1 The paper is organized as follows. In the first section, a Tauberian theorem for nonexpansive operators is stated and proved. In the second section, a Tauberian theorem for stochastic games is deduced from the first section. In the third section, particular cases of stochastic games are considered. The fourth section presents the aforementioned example. 1 Nonexpansive operators Let (X, . ) be a Banach space, and Ψ : X X be a nonexpansive mapping, that is: k k → (f,g) X2 Ψ(f) Ψ(g) f g . ∀ ∈ k − k ≤ k − k By a standard fixed point argument (see Sorin [24, Appendix C]), there exists a bounded family (v ) such that for all λ (0,1] λ λ∈(0,1] ∈ v = λΨ((1 λ)λ−1v ). (1.1) λ λ − 2 For n N∗, define ∈ v := n−1Ψn(0), (1.2) n where Ψn is the n-th iterate of Ψ. Because Ψ is nonexpansive, (v ) is bounded. n n≥1 Kohlberg and Neyman [5] provided conditions under which lim v and lim v exist. In n→+∞ n λ→0 λ this section, we investigate the link between the existence of lim v and lim v . We make n→+∞ n λ→0 λ the following assumption: Assumption 1. There exists C > 0 such that for all λ,λ′ (0,1], f X, ∈ ∈ λΨ(λ−1f) λ′Ψ(λ′−1f) C λ λ′ . − ≤ − (cid:13) (cid:13) (cid:12) (cid:12) Remark 1.1. An important(cid:13)class of operators which(cid:13)satisfy(cid:12) Assu(cid:12)mption 1 is the following. Let K be any set, and X be the set of bounded real-valued functions defined on K, equipped with the uniform norm. Consider two sets S and T, and a family of linear forms (P ) on X, k,s,t (k,s,t)∈K×S×T such that for all (k,s,t), P is of norm smaller than 1. Let g : K S T R be a bounded k,s,t × × → function. Define Ψ : X X by Ψ(f)(k) := sup inf g(k,s,t)+P (f) , for all f X and → s∈S t∈T { k,s,t } ∈ k K. This class includes Shapley operators (see Neyman [12, p.397-415]): this corresponds to ∈ the case where K is the state space of some zero-sum stochastic game, S (resp. T) is the set of mixed actions of Player 1 (resp. 2), k is the current state, and P (f) is the expectation of f(k′) k,s,t under mixed actions s and t, where k′ is the state at next stage. Under suitable assumptions, for all n N∗ and λ (0,1], v and v are respectively the value of the n-stage game and the value of n λ ∈ ∈ the λ-discounted game. This point will be useful in Sections 2 and 3. We now state a Tauberian theorem for nonexpansive operators satisfying Assumption 1. Theorem 1.2. Under Assumption 1, the two following statements are equivalent: (a) The sequence (v ) converges when n goes to infinity. n n≥1 (b) The mapping λ v has a limit when λ goes to 0. λ → Moreover, when these statements hold, we have lim v = lim v . n→+∞ n λ→0 λ The remainder of this section is dedicated to the proof of the theorem. Definition 1.3. Let λ (0,1] and n N∗. The operator Ψn : X X is defined recursively by ∈ ∈ λ → Ψ0(f):= f for all f X, and for n 1: λ ∈ ≥ f X Ψn(f):= λΨ((1 λ)λ−1Ψn−1(f)). ∀ ∈ λ − λ Note that equation (1.1) writes v = Ψ1(v ). λ λ λ Lemma 1.4. Let f,g X, λ (0,1], n N∗ and t 1,2,...,n . Then ∈ ∈ ∈ ∈ { } (i) Ψt(f) Ψt(g) (1 λ)t f g λ − λ ≤ − k − k (cid:13) (cid:13) (ii) (cid:13) (cid:13) Ψt (f) n−1Ψt((n t)f) (C + f ) tn−1 1+(1 n−1)t . n−1 − − ≤ k k − − (cid:13) (cid:13) (cid:2) (cid:3) Proof. Proof (cid:13) (cid:13) (i) This follows from the nonexpansiveness of Ψ. 3 (ii) We have Ψt (f) n−1Ψt((n t)f) (1 n−1)Ψt−1(f) n−1Ψt−1((n t)f) n−1 − − ≤ − n−1 − − (cid:13)(cid:13) (cid:13)(cid:13) ≤ C(cid:13)(cid:13) n−1−(1−n−1)n−1 + (1−n−1)2Ψtn(cid:13)(cid:13)−−21(f)−n−1Ψt−2((n−t)f) t (cid:2) (cid:3) (cid:13) (cid:13) C (n−1 (1 n−1)m−1(cid:13)n−1)+ (1 n−1)tf n−1(n t)f (cid:13) ≤ − − − − − m=1 X (cid:13) (cid:13) = C tn−1 1+(1 n−1)t + (1 n(cid:13)−1)t n−1(n t) f (cid:13) − − − − − k k = (C + f ) tn−1 1+(1 n−1)t . (cid:2) (cid:3) (cid:2) (cid:3) k k − − (cid:2) (cid:3) The first inequality stems from the nonexpansiveness of Ψ. In the second inequality, we applied Assumption 1 for λ = (1 n−1)n−1, λ′ = n−1 and f = (1 n−1)2Ψt−2(f), and used − − n−1 the nonexpansiveness of Ψ. Applying successively Assumption 1 for λ = (1 n−1)m−1n−1, − λ′ = n−1 and f = (1 n−1)mΨt−m(f) (m 1,...,t ) togetheer with the nonexpansiveness of − n−1 ∈ { } Ψ yields the third inequality. e We now prove that (a) implies (b). (a) ⇒ (b) Assume (a). Let (λ,λ′) (0,1]2. We have ∈ vλ vλ′ = λΨ((1 λ)λ−1vλ) λ′Ψ((1 λ′)λ′−1vλ′) k − k − − − (cid:13) (cid:13) ≤ (cid:13)(cid:13)C λ−λ′ + λ′Ψ((1−λ)λ′−1vλ)−λ′Ψ((1(cid:13)(cid:13)−λ′)λ′−1vλ′) ≤ (C(cid:12)(cid:12)+kvλk(cid:12)(cid:12)) λ(cid:13)(cid:13)(cid:13)−λ′ +(1−λ′)kvλ−vλ′k. (cid:13)(cid:13)(cid:13) (cid:12) (cid:12) In the first inequality, we applied Assu(cid:12)mption(cid:12)1 to f = (1 λ)vλ, and in the second inequality, − we applied twice the nonexpansiveness of Ψ. We deduce the existence of A > 0 such that for all (λ,λ′) (0,1]2, vλ vλ′ A λ λ′ λ′−1. Consequently, in order to prove (b), it is sufficient to ∈ k − k ≤ | − | prove that (vn−1)n≥1 converges when n goes to infinity. By (a), there exists v∗ X such that (v ) converges to v∗. Let ǫ (0,1/4). Let N N∗ n n≥1 0 ∈ ∈ ∈ such that for all n N , 0 ≥ v v∗ ǫ2/2. (1.3) n k − k ≤ Let n ǫ−2N , λ := n−1, and t := ǫn , where x denotes the integer part of x. Equations (1.1) 0 ≥ ⌊ ⌋ ⌊ ⌋ and (1.2) yield v = Ψt(v ), (1.4) λ λ λ and v = n−1Ψt((n t)v ). (1.5) n n−t − We have v v v Ψt(v ) + Ψt(v ) v . k λ− nk ≤ λ− λ n−t λ n−t − n (cid:13) (cid:13) (cid:13) (cid:13) Applying first (1.4) and Lemma 1.4 (i)(cid:13), then (1.3), we(cid:13)obt(cid:13)ain (cid:13) v Ψt(v ) (1 λ)t v v λ− λ n−t ≤ − k λ− n−tk (1 λ)t v v +(1 λ)t v v (cid:13) (cid:13) λ n n n−t (cid:13) (cid:13) ≤ − k − k − k − k (1 λ)t v v +ǫ2. λ n ≤ − k − k 4 Let M := C +sup v . Equality (1.5) and Lemma 1.4 (ii) yield n n∈Nk k Ψt(v ) v (C + v ) tn−1 1+(1 n−1)t λ n−t − n ≤ k n−tk − − (cid:13) (cid:13) M ǫ 1+e−(cid:2)ǫ+ǫ2 . (cid:3) (cid:13) (cid:13) ≤ − (cid:16) (cid:17) The last two inequalities yield v v (1 λ)ǫn−1 v v +ǫ2+M ǫ 1+e−ǫ+ǫ2 λ n λ n k − k ≤ − k − k − e−ǫ+ǫ2 v v +ǫ2+M ǫ (cid:16)1+e−ǫ+ǫ2 . (cid:17) λ n ≤ k − k − (cid:16) (cid:17) We deduce that v v ǫ2+M ǫ 1+e−ǫ+ǫ2 1 e−ǫ+ǫ2 −1. λ n k − k ≤ − − h (cid:16) (cid:17)i(cid:16) (cid:17) The right-hand side goes to 0 when ǫ goes to 0, thus (b) holds. (b) ⇒ (a) Assume (b). There exists β (0,1) such that for all λ (0,β], we have ∈ ∈ v v∗ ǫ2/2. (1.6) λ k − k ≤ Let ǫ (0,1) such that for all ǫ ǫ , e−ǫ 1 ǫ+ǫ2. Fix ǫ (0,ǫ /2], and define r := ǫ−3/2 . 0 0 0 0 ∈ ≤ ≤ − ∈ ⌊ ⌋ Let N 1 such that (1 ǫ)r0−1N (βǫ)−1. Let n N. For r N∗, define n := (1 ǫ)r−1n r ≥ ⌊ − ⌋ ≥ ≥ ∈ ⌊ − ⌋ and λ := 1/n . The following assertions hold: r r Lemma 1.5. (i) r 1,...,r λ βǫ 0 r ∀ ∈ { } ≤ (ii) r 1,...,r 1 (1 1/n )nr−nr+1 n /n 4ǫ2 0 r r+1 r ∀ ∈ { − } − − ≤ (iii) r 1,...,r 1 v v ǫ2 ∀ ∈ { 0 − } λr − λr+1 ≤ Proof. Proof (cid:13) (cid:13) (cid:13) (cid:13) (i) Let r 1,...,r . We have (1 ǫ)r−1n (βǫ)−1, thus λ βǫ. 0 r ∈ { } ⌊ − ⌋≥ ≤ (ii) Let r 1,...,r 1 . We have 0 ∈ { − } (1 1/n )nr−nr+1 e−(nr−nr+1)/nr r − ≤ 1 (n n )/n +[(n n )/n ]2 r r+1 r r r+1 r ≤ − − − n /n +(1 n /n )2 r+1 r r+1 r ≤ − n /n +(ǫ+1/n )2 r+1 r r ≤ n /n +4ǫ2. r+1 r ≤ (iii) It is a direct consequence of (1.6) and (i). Let r 1,...,r 1 . Equations (1.1) and (1.2) yield 0 ∈ { − } v = (n )−1Ψnr−nr+1(n v ) (1.7) nr r r+1 nr+1 5 and v = Ψnr−nr+1(v ). (1.8) λr λr λr We have v v v (n )−1Ψnr−nr+1(n v ) + (n )−1Ψnr−nr+1(n v ) v . k nr − λrk ≤ nr − r r+1 λr r r+1 λr − λr (cid:13) (cid:13) (cid:13) (cid:13) Applying first (1.7) a(cid:13)nd then Lemma 1.5 (iii) yields (cid:13) (cid:13) (cid:13) v (n )−1Ψnr−nr+1(n v ) (n )−1n v v nr − r r+1 λr ≤ r r+1 nr+1 − λr (n )−1n v v +ǫ2 . (cid:13)(cid:13) (cid:13)(cid:13) ≤ r r+1(cid:13)(cid:13) nr+1 − λ(cid:13)(cid:13)r+1 (cid:0)(cid:13) (cid:13) (cid:1) Let M := C +supn∈N∗kvnk+supλ∈(0,1]kvλk+1. Equality (1.8)(cid:13)and Lemma 1(cid:13).4 (ii) yield (n )−1Ψnr−nr+1(n v ) v (C + v ) (n n )/n 1+(1 1/n )nr−nr+1 r r+1 λr − λr ≤ k λrk r − r+1 r − − r M (1 1/n )nr−nr+1 n /n (cid:13) (cid:13) (cid:2)r r+1 r (cid:3) (cid:13) (cid:13) ≤ − − 4ǫ2M. (cid:2) (cid:3) ≤ We deduce that n v v n v v +ǫ2 +4ǫ2Mn rk nr − λrk ≤ r+1 nr+1 − λr+1 r n v v +5ǫ2Mn . ≤ r+1(cid:0)(cid:13)(cid:13)nr+1 − λr+1 (cid:13)(cid:13) (cid:1) r (cid:13) (cid:13) Summing from r = 1 to r = r0 1 yields (cid:13) (cid:13) − kvn−vn−1k ≤ nr0n−1 vnr0 −vλr0 +5ǫ2Mr0 2(1 ǫ)(cid:13)(cid:13)⌊ǫ−3/2⌋−1M +(cid:13)(cid:13) 5ǫ1/2M. ≤ − (cid:13) (cid:13) The right-hand side goes to 0 as ǫ goes to 0, thus (a) holds. Notethattheproofofthetwoimplicationsshowthatwhen(a)and(b)hold,wehavelim v = n→+∞ n lim v . λ→0 λ 2 Applications to zero-sum stochastic games In this section, we apply Theorem 1.2 to zero-sum stochastic games. 2.1 Dynamic programming A dynamic programming problem is described by a state space K, a nonvoid correspondence F :K ⇒ K, and a bounded payoff function g : K R. → The problem proceeds as follows. Given an initial state k K, at each stage, the decision- 1 ∈ maker chooses k F(k ), and gets the stage payoff g(k ). For λ (0,1] (resp. n N∗) in m+1 m m ∈ ∈ ∈ the λ-discounted problem (resp. n-stage problem), the decision-maker maximizes the total payoff λ(1 λ)m−1g(k ) (resp. n−1 n g(k )). m≥1 − m m=1 m A strategy for the decision-maker assigns a decision k F(k ) to each finite history m+1 m P P ∈ (k ,k ,...,k ). We denoterespectively v (k )andv (k ) thevalueof theλ-discounted problemand 1 2 m λ 1 n 1 n-stage problem: v (k ):= sup λ(1 λ)m−1g(k ) and v (k ):= sup n−1 n g(k ), λ 1 s∈S m≥1 − m n 1 s∈S m=1 m where S is the set of strategies. P P Let X be the set of bounded real-valued functions defined on K, equipped with the uniform norm. For (f,k) X K, let ∈ × Ψ(f)(k) := g(k)+ sup f(k′). k′∈F(k) 6 X is a Banach space and Ψ is a nonexpansive operator which satisfies Assumption 1. Standard dynamic programming gives (see Lehrer and Sorin [6]): v (k) = λg(k)+(1 λ) sup v (k′) = λΨ((1 λ)λ−1v ) (k) λ λ λ − k′∈F(k) − (cid:2) (cid:3) and v (k) = n−1g(k)+(1 n−1) sup v (k′) = n−1Ψ((n 1)v ) (k). n n−1 n−1 − k′∈F(k) − (cid:2) (cid:3) Applying Theorem 1.2, we recover the Tauberian theorem proved in Lehrer and Sorin [6]: (v ) con- n verges uniformly on K if and only if (v ) converges uniformly on K, and when uniform convergence λ holds, the two limits coincide. 2.2 Zero-sum stochastic games If (C,C) is a Borel subset of a Polish space, we denote by ∆(C) the set of probability measures on C, equipped with the weak∗ topology. We use the same framework as in Maitra and Parthasarathy [8]. We consider a general model of zero-sum stochastic game, described by a state space K which is a Borel subset of a Polish space, two action sets I and J, which are Borel subsets of a Polish space, a Borel measurable transition functionq :K I J ∆(K), and aboundedBorel measurablepayoff function g :K I J R. × × → × × → Theinitial state is k K, and the stochastic game Γ(k ) which starts in k proceeds as follows. 1 1 1 ∈ At each stage m 1, both players choose simultaneously and independently an action, i I m ≥ ∈ (resp. j J) for Player 1 (resp. 2). The payoff at stage m is g := g(k ,i ,j ). The state m m m m m ∈ k of stage m+1 is drawn from the probability distribution q(k ,i ,j ). Then (k ,i ,j ) m+1 m m m m+1 m m is publicly announced to both players. ThesetofallpossiblehistoriesbeforestagemisH := (K I J)m−1 K. Abehavioral strategy for m × × × Player1(resp. 2)isaBorelmeasurablemappingσ : H ∆(I)(resp. τ : H ∆(J)). m≥1 m m≥1 m A triple (k ,σ,τ) K Σ T induces a probabili∪ty measur→e on H := (K ∪I J)N∗→, denoted 1 ∞ ∈ × × × × by Pk1 . Let λ (0,1]. The λ-discounted game Γ (k ) is the game defined by its normal form σ,τ ∈ λ 1 (Σ,T ,γk1), where λ γk1λ(σ,τ) := Ek1 λ(1 λ)m−1g . σ,τ  − m m≥1 X   Let n N∗. The n-stage game Γ (k ) is the game defined by its normal form (Σ,T ,γk1), where ∈ n 1 n n 1 γk1(σ,τ) := Ek1 g . n σ,τ n m ! m=1 X Let f : K R be a bounded Borel measurable function, and (k ,x,y) K ∆(I) ∆(J). Define 1 → ∈ × × Ek (f):= f(k′)dq(k,i,j)(k′)dx(i)dy(j) x,y Z(k′,i,j)∈K×I×J and g(k,x,y) := g(k,i,j)dx(i)dy(j). Z(i,j)∈I×J We make the following assumption: Assumption 2. For all k K, λ (0,1] and n N∗, the games Γ (k ) and Γ (k ) have a value, 1 λ 1 n 1 ∈ ∈ ∈ that is, there exists real numbers v (k ) and v (k ) such that: λ 1 n 1 v (k ) = sup inf γk1(σ,τ) = inf supγk1(σ,τ), λ 1 σ∈Στ∈T λ τ∈T σ∈Σ λ 7 and v (k ) = sup inf γk1(σ,τ) = inf supγk1(σ,τ). n 1 σ∈Στ∈T n τ∈T σ∈Σ n Moreover, v and v are Borel measurable, and satisfy the following Shapley equations: λ n v (k ) = sup inf λg(k ,x,y)+(1 λ)Ek1 (v ) λ 1 x∈∆(I)y∈∆(J) 1 − x,y λ n o = inf sup λg(k ,x,y)+(1 λ)Ek1 (v ) y∈∆(J)x∈∆(I) 1 − x,y λ n o and v (k ) = sup inf n−1g(k ,x,y)+(1 n−1)Ek1 (v ) n 1 x∈∆(I)y∈∆(J) 1 − x,y n−1 n o = inf sup n−1g(k ,x,y)+(1 n−1)Ek1 (v ) . y∈∆(J)x∈∆(I) 1 − x,y n−1 n o LetX bethesetofboundedBorelmeasurablefunctionsfromK toR,equippedwiththeuniform norm, and for all (f,k) X K, we define Ψ(f)(k) := sup inf g(k,x,y)+Ek (f) . ∈ × x∈∆(I) y∈∆(J) x,y We make the following assumption: (cid:8) (cid:9) Assumption 3. For all f X, Ψ(f) is Borel measurable. ∈ Remark 2.1. When K, I and J are compact metric spaces and q and g are jointly continuous, Assumptions2and3hold. MaitraandParthasarathy[8]andNowak[13]providedweakerconditions under which Assumptions 2 and 3 hold. Theorem 1.2 yields the following Tauberian theorem for stochastic games: Theorem 2.2. Under Assumptions 2 and 3, the two following statements are equivalent: (a) The family of functions (v ) converges uniformly on K when n goes to infinity. n n≥1 (b) The family of functions (v ) converges uniformly on K when λ goes to 0. λ λ∈(0,1] Moreover, when these statements hold, we have lim v = lim v . n→+∞ n λ→0 λ Proof. Proof X is a Banach space, and Assumption 3 ensures that Ψ is well defined from X to X. Moreover, Ψ is a nonexpansive operator which satisfies Assumption 1. Thus Theorem 1.2 applies to Ψ. By Assumption 2, the families of values (v ) and (v ) satisfy equations (1.1) and (1.2), and λ n the result is proved. 2.3 Stochastic games with signals Assume K, I and J to be finite. The previous model can be generalized in the following way. Let A(resp. B)bea finite set of signals for Player 1 (resp. 2). Instead of observing the pastactions (i ,j ) and the future state k at the end of each stage m, Player 1 (resp. 2) gets a private m m m+1 signal a (resp. b ), which is correlated to (k ,i ,j )(see Mertens, Sorin and Zamir [10, Chapter m m m m m IV] for more details). This defines a stochastic game with signals, denoted by Γ. A behavioral strategy for a player assigns a mixed action to each of his private history (that is, all the actions he has played and all the signals he has received before the current stage). Because K, I, J, A and B are finite, the λ-discounted game and the n-stage game have a value for all λ (0,1] and n N∗. ∈ ∈ By Mertens, Sorin and Zamir [10, Chapter III], there exists a stochastic game Γ (in the sense of the previous subsection) with perfect observation of the state and actions, which is equivalent to Γ: it has the same λ-discounted and n-stage values. The state space of this auxiliaryestochastic game is the set of infinite higher-order beliefs of the players about the state: this is the universal belief space, denoted by B. The set B is compact metric, and Assumptions 2 and 3 are satisfied. Thus Theorem 2.2 applies to the auxiliary stochastic game Γ. e 8 Proposition 2.3. The two following statements are equivalent: (a) The family of functions (v ) converges uniformly on B when n goes to infinity. n n≥1 (b) The family of functions (v ) converges uniformly on B when λ goes to 0. λ λ∈(0,1] Moreover, when these statements hold, we have lim v = lim v . n→+∞ n λ→0 λ Remark 2.4. It is not known in general if the families (v ) and (v ) are equicontinuous. n n≥1 λ λ∈(0,1] Thus uniform convergence may bedifficult to prove, even when pointwise convergence holds. In the examples of the next section, (v ) and (v ) are equi-Lipschitz, thus pointwise convergence n n≥1 λ λ∈(0,1] and uniform convergence are equivalent in these examples. 3 Examples of zero-sum stochastic games We apply the results of the previous section to several standard examples of zero-sum stochastic games. 3.1 Stochastic games with compact action sets and finite state space We consider the case where the state space is finite, the action sets are compact, and the transi- tionandpayofffunctionsareseparatelycontinuous. Bythestandardminmaxtheorem,Assumptions 2 and 3 hold (it is a particular case of Maitra and Parthasarathy [8]). Because K is finite, uniform convergence and pointwise convergence with respect to the state variable are equivalent. Theorem 2.2 yields the following proposition: Proposition 3.1. In a stochastic game with finite state space and compact action sets, the two following statements are equivalent: (a) For all k K, (v (k )) converges when n goes to infinity. 1 n 1 ∈ (b) For all k K, (v (k )) converges when λ goes to 0. 1 λ 1 ∈ Moreover, when these statements hold, we have lim v (k ) = lim v (k ) for all k K. n→+∞ n 1 λ→0 λ 1 1 ∈ 3.2 Hidden stochastic games Consider the following example of stochastic game with signals. Assume that K, I and J are finite, and that players do not observe the current state at each stage (they observe past actions). Instead,theyreceive apublicsignalaboutit, lyinginsomefinitesetA(seeRenaultandZiliotto [18] for more details). In this particular case, the universal belief space is B = ∆(K): this corresponds to the common belief of the players about the state (see Ziliotto [27]). Thus, (v ) and (v ) can λ n be considered as families of maps from ∆(K) to R. They are both equi-Lipschitz, thus for theses families, pointwise convergence and uniform convergence are equivalent. By Proposition 2.3, the following result holds. Proposition 3.2. In a hidden stochastic game, the two following statements are equivalent: (a) For all p ∆(K), (v (p )) converges when n goes to infinity. 1 n 1 ∈ (b) For all p ∆(K), (v (p )) converges when λ goes to 0. 1 λ 1 ∈ Moreover, when these statements hold, we have lim v (p ) = lim v (p ) for all p ∆(K). n→+∞ n 1 λ→0 λ 1 1 ∈ 9 3.3 Markov chain games with incomplete information on both sides Consider the following example of stochastic game with signals. Assume that K, I and J are finite, and that the state space is a product K = C D, such that the two components of the × state follow independent Markov chains. Players know the transition and the initial distribution of each Markov chain, but only Player 1 (resp. 2) observes the realization at stage 1 of the first (resp. second) component. From stage 2, they do not observe the state. They observe past actions (see Gensbittel and Renault [3] for more details). In this particular case, the equivalent stochastic game with perfect observation of the state has state space ∆(C) ∆(D), that is, the product of × the set of possible beliefs of Player 2 about the initial state of the first Markov chain, and of the set of possible beliefs of Player 1 about the initial state of the second Markov chain. Thus, (v ) and λ (v ) can be considered as families of maps from ∆(C) ∆(D) to R. They are both equi-Lipschitz, n × thus for these families, pointwise convergence and uniform convergence are equivalent. Gensbittel and Renault [3] proved that (v ) converges, and asked whether (v ) converges. By Remark 2.1, n λ Assumptions 2 and 3 hold, and from Theorem 2.2 we deduce the following result: Proposition 3.3. In a Markov chain game with incomplete information on both sides, for all p ∆(C) ∆(D), (v (p )) and (v (p )) converge to the same limit. 1 n 1 λ 1 ∈ × 4 An example In this section, we prove the following theorem: Theorem 4.1. There exists a hidden stochastic game such that for some initial state k K known 1 ∈ by both players, (v (k )) and (v (k )) converge to distinct limits. λ 1 n 1 Remark 4.2. As proved in Ziliotto [27, Section 4, p. 21], this hidden stochastic game can be adapted, in order to get an example of a stochastic game with compact action sets and finite state space, such that for some initial state k K, (v (k )) and (v (k )) converge to distinct limits. Itis 1 λ 1 n 1 ∈ alsopossibletobuildan exampleofahiddenstochastic game suchthatforsomeinitial state k K 1 ∈ known by both players, (v (k )) converges but (v (k )) does not, and conversely, an example where λ 1 n 1 (v (k )) converges but (v (k )) does not. n 1 λ 1 Beforegoingtotheproof,weprovidesomepieceofintuition. InZiliotto [27],ahiddenstochastic game Γ is constructed, in which neither (v (k )) nor (v (k )) converges, where k is an initial state λ 1 n 1 1 known by both players. In the discounted game, there exists optimal stationary strategies (that is, strategies which only depend on the common belief about the current state). In this example, a stationary strategy for Player 1 (resp. 2) is equivalent to the choice of an integer a rN (resp. ∈ b 2rN). Apart from the fact that Player 2’s set of stationary strategies is smaller, the game is ∈ symmetric. In Γ (k ), the optimal choice for both players is to choose m as close as possible to λ 1 ln(√2λ)/ln(2). For some λ, the closest integer lies in r(2N+1), and Player 1 has an advantage, − whereas for some other discount factors, it lies in 2rN, and Player 1 has no advantage. This is why (v (k )) oscillates (between 1/2 and some l (1/2,1]). In Γ (k ), there may not exist λ 1 n 1 ∈ optimal stationary strategies. Depending on the stage of the game m 1,...n , the optimal ∈ { } integer for Player 1 lies in 2rN, or in r(2N+1). Thus, according to the stage of the game, Player 1 may or may not have an advantage. This is in sharp contrast to the discounted game Γ (k ), λ 1 in which either Player 1 always has an advantage at any stage of the game, or it never has one. That is why we believe that in this example, liminf v (k ) > 1/2 (but we were not able to n→+∞ n 1 prove it). Nonetheless, one can construct a hidden stochastic game Γ1, very similar to Γ, in which liminf v1(k )> 1/2(thiscorrespondstoStep1oftheproof). InStep2,weconstructahidden n→+∞ n 1 stochastic game Γ2, which only difference with Γ1 is that Player 2’s set of stationary strategies is equivalent to r(2N+1), instead of 2rN. In Γ2, we also have liminf v2(k ) > 1/2, where k is n→+∞ n 2 2 an initial state known by both players. In Step 3, we definethe hiddenstochastic game Γ3, in which starting from an initial state ω , Player 2 chooses between playing Γ1(k ) or Γ2(k ). In Γ3(ω ), 3 1 2 λ 3 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.