Table Of ContentStochastic dynamic programming : successive approximations
and nearly optimal strategies for Markov decision processes
and Markov games
Citation for published version (APA):
Wal, van der, J. (1980). Stochastic dynamic programming : successive approximations and nearly optimal
strategies for Markov decision processes and Markov games. [Phd Thesis 1 (Research TU/e / Graduation TU/e),
Mathematics and Computer Science]. Stichting Mathematisch Centrum. https://doi.org/10.6100/IR144733
DOI:
10.6100/IR144733
Document status and date:
Published: 01/01/1980
Document Version:
Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)
Please check the document version of this publication:
• A submitted manuscript is the version of the article upon submission and before peer-review. There can be
important differences between the submitted version and the official published version of record. People
interested in the research are advised to contact the author for the final version of the publication, or visit the
DOI to the publisher's website.
• The final author version and the galley proof are versions of the publication after peer review.
• The final published version features the final layout of the paper including the volume, issue and page
numbers.
Link to publication
General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners
and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.
• Users may download and print one copy of any publication from the public portal for the purpose of private study or research.
• You may not further distribute the material or use it for any profit-making activity or commercial gain
• You may freely distribute the URL identifying the publication in the public portal.
If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please
follow below link for the End User Agreement:
www.tue.nl/taverne
Take down policy
If you believe that this document breaches copyright please contact us at:
openaccess@tue.nl
providing details and we will investigate your claim.
Download date: 02. Mar. 2023
STOCHASTIC
DYNAMIC
PROGRAMMING
SUCCESSIVE APPROXIMATIONS
AND NEARLY OPTIMAL STRATEGIES
FOR MARKOV DECISION PROCESSES
AND MARKOV GAMES
STOCHASTIC
DYNAMIC
PROGRAMMING
SUCCESSIVE APPROXIMATIONS
AND NEARLY OPTIMAL STRATEGIES
FOR MARKOV DECISION PROCESSES
AND MARKOV GAMES
PROEFSCHRIFT
TER VERKRIJGING VJ'>.R DE GRAAD VJ'>.R DOCTOR IN DE
TECHNISCHE WETENSCHAPPEN ~ DE TECHNISCHE
HOGESCHOOL EINDHOVEN, OP GEZAG VAN DE RECTOR
MAGNIFICUS, PROF. IR. J. ERKELENS, VOOR EEN
COMMISSIE AJI..NGEWEZEN DOOR HET COLLEGE VJ'>.R
DEKANEN IN HET OPENBAAF. TE VERDEDIGEN OP
VRIJDAG 19 SEPTEMBER 1980 TE 16.00 UUR
DOOR
JOHANNES VAN DER WAL
GEBOREN TE AMSTERDAM
1980
MATHEMATISCH CENTRUM, AMSTERDAM
Dit proefschrift is goedgekeurd
door de promotoren
Prof.dr. J. Wessels
en
Prof.dr. J.F. Benders
Aan Willemien
Aan mijn moeder
CONTENTS
CHAPTER 1 • GENERAL INTRODUCTION
1.1. Informal description of the models
1.2. The functional equations 3
1.3. Review of the existing algorithms 4
1.4. Summary of the following chapters 6
1.5. Formal description of the MOP model 9
1. 6. Notations 13
CHAPTER 2. THE GENERAL TOTAL REWARD MOP
2.1. Introduction 17
2.2. Some preliminary results 18
2.3. The finite-stage MDP 22
2.4. The optimality equation 26
2.5. The negative case 28
2.6. The restriction to Markov strategies 30
2.7. Nearly-optimal strategies 32
CHAPTER 3. SUCCESSIVE APPROXIMATION METHODS FOR THE TOTAL-REWARD MOP
3.1. Introduction 43
3.2. Standard successive approximations 44
3.3. Successive approximation methods and go-ahead functions 49
3.4. The operators L (n) and U 53
0 0
3.5. The restriction to Markov strategies in Uv 58
0
3.6. Value-oriented successive approximations 61
CHAPTER 4. THE STRONGLY CONVERGENT MOP
4.1. Introduction 65
4.2. Conservingness and optimality 70
4.3. Standard successive approximations 73
4.4. The policy iteration method 74
4.5. Strong convergence and Liapunov functions 76
4.6. The convergence of U~v to v* 80
4.7. Stationary go-ahead functions and strong convergence 86
4.8. Value-oriented successive approximations 88
CHAPTER 5. THE CONTRACTING MDP
5.1. Introduction 93
5.2. The various contractive MDP models 94
5.3. Contraction and strong convergence 103
5.4. Contraction and successive approximations 104
5.5. The discounted MOP with finite state and action spaces 108
5.6. Sensitive optimality 115
CHAPTER 6. INTRODUCTION TO THE AVERAGE-REWARD MDP
6.1. Optimal stationary strategies 117
6.2. The policy iteration method 119
6.3. Successive approximations 123
CHAPTER 7. SENSITIVE OPTIMALITY
7.1. Introduction 129
7.2. The equivalence of k-order average optimality and
(k-1)-discount optimality 131
7.3. Equivalent successive approximation methods 138
CHAPTER 8. POLICY ITERATION, GO-AHEAD FUNCTIONS AND SENSITIVE OPTIMALITY
8.1. Introduction 141
8.2. Some notations and preliminaries 142
8.3. The Laurent series expansion of L , (h)v (f) 146
6 0 6
8.4. The policy improvement step 149
B.S. The convergence proof 153
CHAPTER 9. VALUE-ORIENTED SUCCESSIVE APPROXIMATIONS FOR THE AVERAGE-
REWARD MOP
9.1. Introduction 159
9.2. Some preliminaries 162
9.3. The irreducible case 163
9.4. The general unichain case 166
9.5. Geometric convergence for the unichain case 171
9.6. The communicating case 173
9.7. Simply connectedness 178
9.8. Some remarks 179
CHAPTER 10. INTRODUCTION TO THE TWO-PERSON ZERO-SUM MARKOV GAME
10.1. The model of the two-person zero-sum Markov game 183
10.2. The finite-stage Markov game 185
10.3. Two-person zero-sum Markov games and the restriction
to Markov strategies 190
10.4. Introduction to the oo-stage Markov game 193
CHAPTER 11 • THE CONTRACTING MARKOV GAME
11.1. Introduction 197
11.2. The method of standard successive approximations 201
11.3. Go-ahead functions 203
11.4. Stationary go-ahead functions 206
11.5. Policy iteration and value-oriented methods 209
11.6. The strongly convergent Markov game 212
CHAPTER 12 • THE POSITIVE MARKOV GAME WHICH CAN BE TERMINATED BY
THE MINIMIZING PLAYER
12.1. Introduction 215
12.2. Some preliminary results 218
12. 3. Bounds on v * and nearly-optimal stationary strategies 222
CHAPTER 13. SUCCESSIVE APPROXIMATIONS FOR THE AVERAGE-REWARD MARKOV GAME
13.1. Introduction and some preliminaries 227
13.2. The unichained Markov game 232
13.3. The functional equation Uv = v+ge has a solution 235
References 239
Symbol index 248
Samenvatting 250
Curriculum vitae 253
CHAPTER 1
GENERAL INTRODUCTION
Description:de beschrijving van stochastische systemen waarvan het gedrag door een of . de positieve Markov spel waarin een van de twee spelers voortdurend