MARKOV DECISION PROCESSES WITH THEIR APPLICATIONS Advances in Mechanics and Mathematics VOLUME 14 Series Editor: David Y. Gao Virginia Polytechnic Institute and State University, U.S.A Ray W. Ogden University of Glasgow, U.K. Advisory Editors: I. Ekeland University of British Columbia, Canada S. Liao Shanghai Jiao Tung University, P.R. China K.R. Rajagopal Texas A&M University, U.S.A. T. Ratiu Ecole Polytechnique, Switzerland W. Yang Tsinghua University, P.R. China MARKOV DECISION PROCESSES WITH THEIR APPLICATIONS By Prof. Ph.D. Qiying Hu Fudan University, China Prof. Ph.D. Wuyi Yue Konan University, Japan Library of Congress Control Number: 2006930245 ISBN-13: 978-0-387-36950-1 e-ISBN-13: 978-0-387-36951-8 Printed on acid-free paper. AMS Subject Classifications: 90C40, 90C39, 93C65, 91B26, 90B25 © 2008 Springer Science+Business Media, LLC All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. 9 8 7 6 5 4 3 2 1 springer.com Contents ListofFigures ix ListofTables xi Preface xiii Acknowledgments xv 1. INTRODUCTION 1 1 ABriefDescriptionofMarkovDecisionProcesses 1 2 OverviewoftheBook 4 3 OrganizationoftheBook 6 2. DISCRETETIMEMARKOVDECISIONPROCESSES: TOTALREWARD 11 1 ModelandPreliminaries 11 1.1 SystemModel 11 1.2 SomeConcepts 12 1.3 FinitenessoftheReward 14 2 OptimalityEquation 17 2.1 ValidityoftheOptimalityEquation 17 2.2 PropertiesoftheOptimalityEquation 21 3 PropertiesofOptimalPolicies 25 4 SuccessiveApproximation 30 5 SufficientConditions 32 6 NotesandReferences 34 3. DISCRETETIMEMARKOVDECISIONPROCESSES: AVERAGECRITERION 39 1 ModelandPreliminaries 39 2 OptimalityEquation 43 vi MARKOVDECISIONPROCESSESWITHTHEIRAPPLICATIONS 2.1 PropertiesofACOEandOptimalPolicies 44 2.2 SufficientConditions 48 2.3 RecurrentConditions 50 3 OptimalityInequalities 53 3.1 Conditions 54 3.2 PropertiesofACOIandOptimalPolicies 57 4 NotesandReferences 60 4. CONTINUOUSTIMEMARKOVDECISIONPROCESSES 63 1 AStationaryModel: TotalReward 63 1.1 ModelandConditions 63 1.2 ModelDecomposition 67 1.3 SomeProperties 71 1.4 OptimalityEquationandOptimalPolicies 77 2 ANonstationaryModel: TotalReward 85 2.1 ModelandConditions 85 2.2 OptimalityEquation 87 3 AStationaryModel: AverageCriterion 95 4 NotesandReferences 101 5. SEMI-MARKOVDECISIONPROCESSES 105 1 ModelandConditions 105 1.1 Model 105 1.2 RegularConditions 107 1.3 Criteria 110 2 Transformation 111 2.1 TotalReward 112 2.2 AverageCriterion 115 3 NotesandReferences 119 6. MARKOVDECISIONPROCESSESINSEMI-MARKOV ENVIRONMENTS 121 1 ContinuousTimeMDPinSemi-MarkovEnvironments 121 1.1 Model 121 1.2 OptimalityEquation 127 1.3 ApproximationbyWeakConvergence 137 1.4 MarkovEnvironment 140 1.5 PhaseTypeEnvironment 143 2 SMDPinSemi-MarkovEnvironments 148 Contents vii 2.1 Model 148 2.2 OptimalityEquation 152 2.3 MarkovEnvironment 158 3 MixedMDPinSemi-MarkovEnvironments 160 3.1 Model 160 3.2 OptimalityEquation 163 3.3 MarkovEnvironment 170 4 NotesandReferences 174 7. OPTIMALCONTROLOFDISCRETEEVENTSYSTEMS: I 177 1 SystemModel 177 2 Optimality 180 2.1 MaximumDiscountedTotalReward 182 2.2 MinimumDiscountedTotalReward 186 3 OptimalityinEventFeedbackControl 186 4 LinktoLogicLevel 189 5 ResourceAllocationSystem 194 6 NotesandReferences 201 8. OPTIMALCONTROLOFDISCRETEEVENTSYSTEMS: II 203 1 SystemModel 203 2 OptimalityEquationandOptimalSupervisors 207 3 LanguageProperties 213 4 SystemBasedonAutomaton 215 5 SupervisoryControlProblems 218 5.1 EventFeedbackControl 218 5.2 StateFeedbackControl 222 6 Job-MatchingProblem 223 7 NotesandReferences 230 9. OPTIMALREPLACEMENTUNDERSTOCHASTIC ENVIRONMENTS 233 1 OptimalReplacement: DiscreteTime 234 1.1 ProblemandModel 234 1.2 TotalCostCriterion 238 1.3 AverageCriterion 241 2 OptimalReplacement: Semi-MarkovProcesses 244 viii MARKOVDECISIONPROCESSESWITHTHEIRAPPLICATIONS 2.1 Problem 244 2.2 OptimalControlLimitPolicies 247 2.3 MarkovEnvironment 250 2.4 NumericalExample 258 3 NotesandReferences 260 10.OPTIMALALLOCATIONINSEQUENTIALONLINE AUCTIONS 265 1 ProblemandModel 265 2 AnalysisforPrivateReservePrice 267 3 AnalysisforAnnouncedReservePrice 271 4 MonotoneProperties 273 5 NumericalResults 282 6 NotesandReferences 284 References 287 Index 295 List of Figures 1.1 Theflowchartofthechapters. 9 7.1 Aresourceallocationsystem: theDESmodel. 195 8.1 Ajob-matchingproblem: theautomatonG. 224 10.1 Optimal allocation s∗(i) versus number of total avail- n ableitemswithn. 283 10.2 MaximalexpectedtotalprofitVn(35)versusnumberof remainedauctionswithλ. 283 10.3 MaximalexpectedtotalprofitV5(35)versusreservewithλ. 284