ebook img

Examples in Markov Decision Processes PDF

308 Pages·2012·2.86 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Examples in Markov Decision Processes

Examples in Markov Decision Processes P809_9781848167933_tp.indd 1 21/8/12 2:12 PM Imperial College Press Optimization Series ISSN 2041-1677 Series Editor: Jean Bernard Lasserre (LAAS-CNRS and Institute of Mathematics, University of Toulouse, France) Vol. 1: Moments, Positive Polynomials and Their Applications by Jean Bernard Lasserre Vol. 2: Examples in Markov Decision Processes by A. B. Piunovskiy Catherine - Examples in Markov Decision.pmd1 8/17/2012, 2:31 PM Imperial College Press Optimization Series Vol. 2 Examples in Markov Decision Processes A. B. Piunovskiy The University of Liverpool, UK Imperial College Press ICP P809_9781848167933_tp.indd 2 21/8/12 2:12 PM Published by Imperial College Press 57 Shelton Street Covent Garden London WC2H 9HE Distributed by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. Imperial College Press Optimization Series — Vol. 2 EXAMPLES IN MARKOV DECISION PROCESSES Copyright © 2013 by Imperial College Press All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher. For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher. ISBN 978-1-84816-793-3 Printed in Singapore. Catherine - Examples in Markov Decision.pmd2 8/17/2012, 2:31 PM August 15, 2012 9:16 P809: Examples in Markov Decision Process Preface Markov Decision Processes (MDP) is a branch of mathematics based on probability theory, optimal control, and mathematical analysis. Sev- eralbookswithcounterexamples/paradoxesinprobability[Stoyanov(1997); Szekely(1986)] and in analysis [Gelbaum and Olmsted(1964)] are in exis- tence; it is therefore not surprising that MDP is also replete with unex- pected counter-intuitiveexamples. The main goalof the currentbook is to collect together such examples. Most of them are based on earlier publica- tions; the remainder are new. This book should be considered as a com- plement to scientific monographs on MDP [Altman(1999); Bertsekas and Shreve(1978); Hernandez-Lerma and Lasserre(1996a); Hernandez-Lerma and Lasserre(1999); Piunovskiy(1997); Puterman(1994)]. It can also serve asareferencebooktowhichonecanturnforanswerstocuriositiesthatarise while studying or teaching MDP. All the examples are self-contained and can be read independently of eachother. Concerning uncontrolledMarkov chains, we mention the illuminating collection of examples in [Suhov and Kelbert(2008)]. A survey of meaningful applications is beyond the scope of the current book. The examples presented either lead to counter-intuitive solutions, or illustrate the importance of conditions in the known theorems. Not all examples are equally simple or complicated. Several examples are aimed at undergraduate students, whilst others will be of interest to professional researchers. The book has four chapters in line with the four main different types of MDP: the finite-horizon case, infinite horizon with total or discounted loss, and averageloss over an infinite time interval. Some basic theoretical statementsandproofsofauxiliaryassertionsareincludedintheAppendix. v August 15, 2012 9:16 P809: Examples in Markov Decision Process vi Examples inMarkov Decision Processes The following notations and conventions will often be used without ex- planation. =△ means ‘equals by definition’; C∞ is the space of infinitely differentiable functions; C(X) is the space of continuous bounded functions on a (topolog- ical) space X; B(X) is the space of bounded measurable functions on a (Borel) spaceX; indiscrete(finiteorcountable)spaces,thediscretetopol- ogy is usually supposed to be fixed; P(X)isthespaceofprobabilitymeasuresonthe(metrizable)space X, equipped with the weak topology; If Γ is a subset of space X then Γc is the complement; IN= 1,2,... is the set of natural numbers; IN =IN 0 ; 0 { } ∪{ } IRN is the N-dimensionalEuclidean space; IR=IR1 is the straight line; IR∗ =[ ,+ ] is the extended straight line; −∞ ∞ IR+ = y >0 is the set of strictly positive real numbers; { } 1, if the statement is correct; I statement = is the indicator { } 0, if the statement is false; (cid:26) function; δ (dy)istheDiracmeasureconcentratedatpointa: δ (Γ)=I Γ a a { ∋ a ; } If r IR∗ then r+ =△ max 0,r , r− =△ min 0,r ; m ∈ m { } { } fi =△ 0 and fi =△ 1 if m<n; i=n i=n X Y r is the integer part, the maximal integer i such that i r. ⌊ ⌋ ≤ Throughout the current book X is the state space, A is the action space, p (dy x,a) is the transition probability, c (x,a) and C(x) are the t t | loss functions. Normally, we denote random variables with capital letters (X), small letters (x) being used just for variables, arguments of functions, etc. Bold case (X) is for spaces. All functions, mappings, and stochastic kernels are assumed to be Borel-measurable unless their properties are explicitly specified. We say that a function on IR1 with the values in a Borel space A is piece-wise continuous if there exists a sequence y such that lim y = i i i →∞ ; lim y = , this function is continuous on each open interval i i ∞ →−∞ −∞ August 15, 2012 9:16 P809: Examples in Markov Decision Process Preface vii (y ,y ) and there exists a right (left) limit as y y +0 (y y 0), i i+1 i i+1 → → − i=0, 1, 2.... A similar definition is accepted for real-valued piece-wise ± ± Lipschitz, continuously differentiable functions. IfXisameasurablespaceandν isameasureonit, thenbothformulae f(x)dν(x) and f(x)ν(dx) X X Z Z denote the same integral of a real-valued function f with respect to ν. w.r.t. is the abbreviation for ‘with respect to’, a.s. means ‘almost surely’, and CDF means ‘cumulative distribution function’. We consider only minimization problems. When formulating theorems and examples published in books (articles) devoted to maximization, we always adjust the statements for our case without any special remarks. It should be emphasized that the terminology in MDP is not entirely fixed. For example, very often strategies are called policies. There exist severalslightly different definitions of a semi-continuous model, and so on. The authoris thankful to Dr.R.Sheenandto Dr.M. Ruck forthe proof reading of all the text. A.B. Piunovskiy TThhiiss ppaaggee iinntteennttiioonnaallllyy lleefftt bbllaannkk August 15, 2012 9:16 P809: Examples in Markov Decision Process Contents Preface v 1. Finite-Horizon Models 1 1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Model Description . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Dynamic ProgrammingApproach . . . . . . . . . . . . . . 5 1.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4.1 Non-transitivity of the correlation . . . . . . . . . 8 1.4.2 The more frequently used control is not better . . 9 1.4.3 Voting . . . . . . . . . . . . . . . . . . . . . . . . 11 1.4.4 The secretary problem . . . . . . . . . . . . . . . 13 1.4.5 Constrained optimization . . . . . . . . . . . . . . 14 1.4.6 Equivalent Markov selectors in non-atomic MDPs 17 1.4.7 Strongly equivalent Markov selectors in non- atomic MDPs . . . . . . . . . . . . . . . . . . . . 20 1.4.8 Stock exchange . . . . . . . . . . . . . . . . . . . 25 1.4.9 Markov or non-Markov strategy? Randomized or not? When is the Bellman principle violated? . . 27 1.4.10 Uniformly optimal, but not optimal strategy . . . 31 1.4.11 Martingales and the Bellman principle . . . . . . 32 1.4.12 Conventions on expectation and infinities . . . . . 34 1.4.13 Nowhere-differentiable function v (x); t discontinuous function v (x) . . . . . . . . . . . . 38 t 1.4.14 The non-measurable Bellman function. . . . . . . 43 1.4.15 No one strategy is uniformly ε-optimal . . . . . . 44 1.4.16 Semi-continuous model . . . . . . . . . . . . . . . 46 ix

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.