Table Of Content

Probability Theory and Stochastic Modelling 97 Alexey Piunovskiy Yi Zhang Continuous-Time Markov Decision Processes Borel Space Models and General Control Strategies Probability Theory and Stochastic Modelling Volume 97 Editors-in-Chief Peter W. Glynn, Stanford, CA, USA Andreas E. Kyprianou, Bath, UK Yves Le Jan, Orsay, France Advisory Editors Søren Asmussen, Aarhus, Denmark Martin Hairer, Coventry, UK Peter Jagers, Gothenburg, Sweden Ioannis Karatzas, New York, NY, USA Frank P. Kelly, Cambridge, UK Bernt Øksendal, Oslo, Norway George Papanicolaou, Stanford, CA, USA Etienne Pardoux, Marseille, France Edwin Perkins, Vancouver, Canada Halil Mete Soner, Zürich, Switzerland The Probability Theory and Stochastic Modelling series is a merger and continuation of Springer’s two well established series, Stochastic Modelling and Applied Probability and Probability and Its Applications. It publishes research monographs that make a significant contribution to probability theory or an applications domain in which advanced probability methods are fundamental. Books in this series are expected to follow rigorous mathematical standards, while alsodisplayingtheexpositoryqualitynecessarytomakethemusefulandaccessible toadvancedstudents,aswellasresearchers.Theseriescoversallaspectsofmodern probability theory including (cid:129) Gaussian processes (cid:129) Markov processes (cid:129) Random Fields, point processes and random sets (cid:129) Random matrices (cid:129) Statistical mechanics and random media (cid:129) Stochastic analysis as well as applications that include (but are not restricted to): (cid:129) Branching processes and other models of population growth (cid:129) Communications and processing networks (cid:129) Computational methods in probability and stochastic processes, including simulation (cid:129) Genetics and other stochastic models in biology and the life sciences (cid:129) Information theory, signal processing, and image synthesis (cid:129) Mathematical economics and finance (cid:129) Statistical methods (e.g. empirical processes, MCMC) (cid:129) Statistics for stochastic processes (cid:129) Stochastic control (cid:129) Stochastic models in operations research and stochastic optimization (cid:129) Stochastic models in the physical sciences More information about this series at http://www.springer.com/series/13205 Alexey Piunovskiy Yi Zhang (cid:129) Continuous-Time Markov Decision Processes Borel Space Models and General Control Strategies Foreword by Albert Nikolaevich Shiryaev 123 Alexey Piunovskiy YiZhang Department ofMathematical Sciences Department ofMathematical Sciences University of Liverpool University of Liverpool Liverpool, UK Liverpool, UK ISSN 2199-3130 ISSN 2199-3149 (electronic) Probability Theoryand Stochastic Modelling ISBN978-3-030-54986-2 ISBN978-3-030-54987-9 (eBook) https://doi.org/10.1007/978-3-030-54987-9 Mathematics Subject Classification: 90C40, 60J76, 62L10, 90C05, 90C29, 90C39, 90C46, 93C27, 93E20 ©SpringerNatureSwitzerlandAG2020 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpart of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission orinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar methodologynowknownorhereafterdeveloped. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfrom therelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained hereinorforanyerrorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregard tojurisdictionalclaimsinpublishedmapsandinstitutionalaffiliations. ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Nothing is as useful as good theory—investigation of challenging real-life problems produces profound theories. Guy Soulsby Foreword This monograph presents a systematic and modern treatment of continuous-time Markovdecisionprocesses.Onecanviewthelatterasaspecialclassof(stochastic) optimal control problems. Thus, it is not surprising that the traditional method of investigation is dynamic programming. Another method, sometimes termed the convexanalyticapproach,isviaareductiontolinearprogramming,andissimilarto and comparable with the weak formulation of optimal control problems. On the other hand, this class ofproblems possessesits own features that can beemployed toderiveinterestingandmeaningfulresults,suchasitsconnection todiscrete-time problems. It is this connection that accounts for many modern developments or complete solutions to otherwise delicate problems in this topic. The authors of this book are well established researchers in the topic of this book, to which, in recent years, they have made important contributions. So they are well positioned to compose this updated and timely presentation of the current state-of-the-art of continuous-time Markov decision processes. Turning to its content, this book presents three major methods of investigating continuous-time Markov decision processes: the dynamic programming approach, the linear programming approach and the method based on reduction to discrete-time problems. The performance criterion under primary consideration is the expected total cost, in addition to one chapter devoted to the long run average cost. Both the unconstrained and constrained versions of the optimal control problemarestudied.Theissueatthecoreisthesufficientclassofcontrolstrategies. Intermsofthetechnicallevel,inmostcases,thisbookintendstopresenttheresults asgenerallyaspossibleandunder conditionsasweakaspossible.Thatmeans,the authors consider Borel models under a wide class of control strategies. This is not onlyforthesakeofgenerality,indeed,itsimultaneouslycoversboththetraditional class of relaxed controls for continuous-time models and randomized controls in semi-Markovdecisionprocesses.Themorerelevantreasonisperhapsthatitpaves thewayforarigoroustreatmentofmoreinvolvingissuesontherealizabilityorthe implementability of the control strategies. In particular, mixed strategies, which were otherwise often introduced verbally, are now introduced as a subclass of control strategies, making use of an external space, an idea that can be ascribed to vii viii Foreword theworksofI.V.Girsanov,N.V.Krylov,A.V.SkorokhodandI.I.Gikhmanand tothebookStatisticsofRandomProcessesbyR.S.Liptserandmyselfformodels ofvariousdegreesofgenerality,whichwasfurtherdevelopedbyE.A.Feinbergfor discrete-time problems. The authors have made this book self-contained: all the statements in the main text are proved in detail, and the appendices contain all the necessary facts from mathematical analysis, applied probability and discrete-time Markov decision processes. Moreover, the authors present numerous solved real-life and academic examples, illustrating how the theory can be used in practice. The selection of the material seems to be balanced. It is natural that many statements presented and proved in this monograph come from the authors them- selves,buttherestcomefromotherresearchers,toreflecttheprogressmadebothin the west and in the east. Moreover, it contains several statements unpublished elsewhere.Finally,thebibliographicalremarksalsocontainusefulinformation.No doubt, active researchers (from the level of graduate students onward) in the fields of applied probability, statistics and operational research, and in particular, stochastic optimal control, as well as statistical decision theory and sequential analysis,willfindthismonographusefulandvaluable.Icanrecommendthisbook to any of them. Albert Nikolaevich Shiryaev Steklov Mathematical Institute Russian Academy of Sciences, Moscow, Russia Preface The study of continuous-time Markov decision processes dates back at least to the 1950s, shortly after that of its discrete-time analogue. Since then, the theory has rapidly developed and has found a large spectrum of applications to, for example, queueingsystems,epidemiology,telecommunicationandsoon.Inthismonograph, we present some recent developments on selected topics in the theory of continuous-time Markov decision processes. Prior tothis book,there have beenmonographs [106, 150,197],solelydevoted to the theory of continuous-time Markov decision processes. They all focus on modelswithafiniteordenumerablestatespace,[150]alsodiscussingsemi-Markov decision processes and featuring applications to queueing systems. Here, we emphasized the word “solely” in the previous claim, because, leaving alone those oncontrolleddiffusionprocesses,therehavealsobeenimportantbooksonthemore generalclassofcontrolledprocesses,see[46,49],aswellasthethesis[236].These works are on piecewise deterministic processes and deal with problems without constraints. Here, we consider, in that language, piecewise constant processes in a Borel state space, but we pay special attention to problems with constraints and develop techniques tailored for our processes. The authors of the books [106, 150, 197], followed a direct approach, in the sense that no reduction to discrete-time Markov decision processes is involved. Consequently, as far as the presentation is concerned, this approach has the desirable advantage of being self-contained. The main tool is the Dynkin formula, and so to ensure the class offunctions of interest is in the domain of the extended generator of the controlled process, a weight function needs to be imposed on the cost and transition rates. In some parts of this book, we also present this approach andapplyittothestudyofconstrainedproblems.Followingtheobservationmade in [230, 231, 264], we present necessary and sufficient conditions for the appli- cability of the Dynkin formula to the class offunctions of interest. This hopefully leadstoaclearerpictureofwhatminimalconditionsareneededforthisapproachto apply. ix x Preface On the other hand, the main theme of this book is the reduction method of continuous-time Markov decision processes. When this method is applicable, it oftenallowsonetodeduceoptimalityresultsundermoregeneralconditionsonthe system primitives. Another advantage is that it allows one to make full use of results known for discrete-time Markov decision processes, and referring to recent results of this kind makes the present book a more updated treatment of continuous-time Markov decision processes. In greater detail, a large part of this book is devoted to the justification of the reduction method and its application to problems with total (undiscounted) cost criteria. This performance criterion was rarely touched upon in [106, 150, 197]. Recently, a method for investigating the space of occupation measures for discrete-timeMarkovdecisionprocesseswithtotalcostcriteriahasbeendescribed, see [61, 63]. The extension to continuous-time Markov decision processes with totalcostcriteriawascarriedoutin[117,185,186].Althoughthecontinuous-time Markov decision processes in [117, 185, 186], were all reduced to equivalent discrete-time Markov decision processes, leading to the same optimality results, different methods were pursued. In this book, we present in detail the method of [185, 186], because it is based on the introduction of a class of so-called Poisson-related strategies. This class of strategies is new to the context of continuous-time Markov decision processes. The advantage of this class of strategiesisthattheyareimplementableorrealizable,inthesensethattheyinduce action processes that are measurable. This realizability issue does not arise in discrete-timeMarkovdecisionprocesses,butisespeciallyrelevanttoproblemswith constraints, where relaxed strategies often need to be considered for the sake of optimality. Although has long been known that relaxed strategies induce action processes with complicated trajectories, in the context of continuous-time Markov decision processes, it was [76], that drew special attention on it, and also con- structed realizable optimal strategies, termed switching strategies, for discounted problems. By the way, in [76], a reduction method was developed for discounted problems, which is also presented in this book. This method is different from the standard uniformization technique. Although it is not directly applicable to the problem when the discount factor is null, our works [117, 185, 186]. were moti- vated by it. A different reduction method was followed in [45, 49], where the induced discrete-timeMarkovdecisionprocesshasamorecomplicatedactionspace(inthe form of some space of measurable mappings) than the original continuous-time Markov decision process. The reduction method presented in this book is different asitinducesadiscrete-timeMarkovdecisionprocesswiththesameactionspaceas the original problem in continuous-time. An outline of the material presented in this book follows. In Chap. 1, we describe the controlled processes and introduce the primarily concerned class of strategies.Wediscusstheirrealizabilityandsufficiencyforproblemswithtotalcost criteria.Thelatterwasachievedbyinvestigatingthedetailedoccupationmeasures. Aseriesofexamplesofcontinuous-timeMarkovdecisionprocessescanbefoundin this chapter which illustrate the practical applications, many of which are solved