International Series in Operations Research & Management Science Richard J. Boucherie Nico M. van Dijk Editors Markov Decision Processes in Practice International Series in Operations Research & Management Science Volume 248 SeriesEditor CamilleC.Price StephenF.AustinStateUniversity,TX,USA AssociateSeriesEditor JoeZhu WorcesterPolytechnicInstitute,MA,USA FoundingSeriesEditor FrederickS.Hillier StanfordUniversity,CA,USA Moreinformationaboutthisseriesathttp://www.springer.com/series/6161 Richard J. Boucherie • Nico M. van Dijk Editors Markov Decision Processes in Practice 123 Editors RichardJ.Boucherie NicoM.vanDijk StochasticOperationsResearch StochasticOperationsResearch UniversityofTwente UniversityofTwente Enschede,TheNetherlands Enschede,TheNetherlands ISSN0884-8289 ISSN2214-7934 (electronic) InternationalSeriesinOperationsResearch&ManagementScience ISBN978-3-319-47764-0 ISBN978-3-319-47766-4 (eBook) DOI10.1007/978-3-319-47766-4 LibraryofCongressControlNumber:2017932096 ©SpringerInternationalPublishingAG2017 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. Printedonacid-freepaper ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerInternationalPublishingAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland To Carla, Fabian,Daphne,Deirdre,andDanie¨l– Thanksforbeingthereindifficulttimes, Richard P.Dorreboomandhisdaughter– forcopingwithmypassions, Nico Foreword Ihadthepleasureofservingastheserieseditorofthisseriesoveritsfirst20years (from1993throughOctober,2013).Oneofthespecialpleasuresofthisworkwas the opportunity to become better acquainted with many of the leading researchers in our field and to learn more about their research. This was especially true in the caseofNicoM.vanDijk,whobecameafriendandovernightguestinourhome.I thenwasdelightedwhenNicoandhiscolleague,RichardJ.Boucherie,agreedtobe theeditorsofahandbook,QueueingNetworks:AFundamentalApproach,thatwas publishedin2010asVol.154inthisseries.Thisoutstandingvolumesucceededin definingthecurrentstateoftheartinthisimportantarea. Because of both its elegance and its great application potential, Markov deci- sion processes have been one of my favorite areas of operations research. A full chapter (Chap.19 in the current tenth edition) is devoted to this topic in my text- book(coauthoredbythelateGeraldJ.Lieberman),IntroductiontoOperationsRe- search. However, I have long been frustrated by the sparsity of publications that describe applications of Markov decision processes. This was less true about 30 yearsagowhenD.J.Whitepublishedhisseminalpapersonsuchrealapplications inInterfaces(seetheNovember–December1985andSeptember–October1988is- sues). Unfortunately, relatively few papers or books since then have delved much into such applications. (One of these few publications is the 2002 book edited by Eugene Feinberg and Adam Shwartz, Handbook of Markov Decision Processes: MethodsandApplications,whichisVol.40inthisseries.) Given the sparse literature in this important area, I was particularly delighted whentheoutstandingteamofNicoM.vanDijkandRichardJ.Boucherieaccepted my invitation to be the editors of this exciting new book that focuses on Markov decisionprocessesinpractice.Oneofmylastactsastheserieseditorwastowork withthesecoeditorsandthepublisherinshepherdingthebookproposalthroughthe processofprovidingthecontractforitspublication.Ifeelthatthisbookmayprove vii viii Foreword tobeoneofthemostimportantbooksintheseriesbecauseitshedssomuchlight onthegreatapplicationpotentialofMarkovdecisionprocesses.Thishopefullywill leadtoarenaissanceinapplyingthispowerfultechniquetonumerousrealproblems. StanfordUniversity FrederickS.Hillier July2016 Preface It is over 30 years ago since D.J. White started his series of surveys on practical applicationsofMarkovdecisionprocesses(MDP),1,2,3 over20yearsafterthephe- nomenalbookbyMartinPutermanonthetheoryofMDP,4 andover10yearssince Eugene A. Feinberg and Adam Shwartz published their Handbook of Markov De- cisionProcesses:MethodsandApplications.5 Inthepastdecades,thepracticalde- velopmentofMDPseemedtohavecometoahaltwiththegeneralperceptionthat MDPiscomputationallyprohibitive.Accordingly,MDPisdeemedunrealisticand isoutofscopeformanyoperationsresearchpractitioners.Inaddition,MDPisham- pered by its notational complications and its conceptual complexity. As a result, MDP is often only briefly covered in introductory operations research textbooks andcourses.Recentlydevelopedapproximationtechniquessupportedbyvastlyin- creasednumericalpowerhavetackledpartofthecomputationalproblems;see,e.g., Chaps.2 and 3 of this handbook and the references therein. This handbook shows thatarevivalofMDPforpracticalpurposesisjustifiedforseveralreasons: 1. First and above all, the present-day numerical capabilities have enabled MDP tobeinvokedforreal-lifeapplications. 2. MDPallowstodevelopandformallysupportapproximateandsimplepractical decisionrules. 3. Lastbutnotleast,MDP’sprobabilisticmodelingofpracticalproblemsisaskill ifnotartbyitself. 1D.J.White.RealapplicationsofMarkovdecisionprocesses.Interfaces,15:73–83,1985. 2D.J.White.FurtherrealapplicationsofMarkovdecisionprocesses.Interfaces,18:55–61,1988. 3D.J.White.ASurveyofApplicationsofMarkovDecisionProcesses.JournaloftheOperational ResearchSociety,44:1073–1096,1993. 4 Martin Puterman. MarkovDecisionProcesses:DiscreteStochasticDynamicProgramming. Wiley,1994. 5 Eugene A. Feinberg and Adam Shwartz, editors. HandbookofMarkovDecisionProcesses: MethodsandApplications.Kluwer,2002. ix x Preface This handbookMarkovDecisionProcessesinPracticeaimstoshowthepower ofclassicalMDPforreal-lifeapplicationsandoptimization.Thehandbookisstruc- turedasfollows: PartI: GeneralTheory PartII: Healthcare PartIII: Transportation PartIV: Production PartV: Communications PartVI: FinancialModeling The chapters of Part I are devoted tothestate-of-the-arttheoreticalfoundationof MDP,includingapproximatemethodssuchaspolicyimprovement,successiveap- proximation and infinite state spaces as well as an instructive chapter on approx- imate dynamic programming. Parts II–VI contain a collection of state-of-the-art applicationsinwhichMDPwaskeytothesolutionapproachin a non-exhaustive selectionofapplicationareas.Theapplication-orientedchaptershavethefollowing structure: • Problemdescription • MDPformulation • MDPsolutionapproach • Numericalandpracticalresults • EvaluationoftheMDPapproachused Next to the MDP formulation and justification, most chapters contain numerical resultsandareal-lifevalidationorimplementationoftheresults.Someofthechap- tersarebasedonpreviouslypublishedresults,someareexpandingonearlierwork, andsomecontainnewresearch.Allchaptersarethoroughlyreviewed.Tofacilitate comparison of theresultsoffered indifferentchapters, several chapters contain an appendix with notation or a transformation of their notation to the basic notation providedinAppendixA.AppendixBcontainsacompactoverviewofallchapters listingdiscreteorcontinuousmodelingaspectsandtheoptimizationcriteriausedin differentchapters. Theoutlineofthesesixpartsisprovidedbelow. PartI:GeneralTheory Thispartcontainsthefollowingchapters: Chapter1: One-StepImprovementIdeasandComputationalAspects Chapter2: ValueFunctionApproximationinComplexQueueingsystems Chapter3: ApproximateDynamicProgrammingbyPracticalExamples Chapter4: ServerOptimizationofInfiniteQueueingSystems Chapter5: Structures of Optimal Policies in MDP with Unbounded Jumps: The StateofOurArt Preface xi Thefirstchapter,byH.C.Tijms,presentsasurveyofthebasicconceptsunderly- ingcomputationalapproachesforMDP.Focusisonthebasicprincipleofpolicyim- provement,thedesignofasinglegoodimprovementstep,andone-stage-look-ahead rules,to,e.g.,generatethebestcontrolruleforthespecificproblemofinterest,for decompositionresultsorparameterization,andtodevelopaheuristicortailor-made rule.Severalintriguingqueueingexamplesareincluded,e.g.,withdynamicrouting toparallelqueues. In the second chapter, by S. Bhulai, using one-step policy improvement is broughtdowntotheessenceofunderstandingandevaluatingtherelativevaluefunc- tionofsimplesystemsthatcanbeusedinthecontrolofmorecomplicatedsystems. First,theessenceofthisrelativevaluefunctionisnicelyclarifiedbystandardbirth deathM/M/squeueingsystems.Next,anumberofapproximationsfortherelative valuefunctionareprovidedandappliedtomorecomplexqueueingsystemssuchas fordynamicroutinginreal-lifemultiskilledcallcenters. Chapter3,byMartijnMesandArturoPe´rezRivera,continuestheapproximation approach and presents approximate dynamic programming (ADP) as a powerful techniquetosolvelarge-scalediscrete-timemultistagestochasticcontrolproblems. Ratherthanamorefundamentalapproachas,forexample,canbefoundintheexcel- lentbookofWarrenB.Powell,6 thischapterillustratesthebasicprinciplesofADP via three different practical examples: the nomadic trucker, freight consolidation, andtacticalplanninginhealthcare. The special but quite natural complication of infinite state spaces within MDP isgiven special attention intwoconsecutive chapters. First,inChap.4,by Andra´s Me´sza´ros and Miklo´s Telek, the regular structure of several Markovian models is exploitedtodecomposeaninfinitetransitionmatrixinacontrollableanduncontrol- lable part, which allows a reduction of the unsolvable infinite MDP into a numer- ically solvable one. The approach is illustrated via queueing systems with parallel serversandacomputersystemwithpowersavingmodeand,inamoretheoretical setting,forbirth-deathandquasi-birth-deathmodels. Next, in Chap.5, by Herman Blok and Floske Spieksma, emphasis is on struc- tural properties of infinite MDPs with unbounded jumps. Illustrated via a running example, the natural question is addressed, how structural properties of the opti- malpolicyarepreservedundertruncationorperturbationoftheMDP.Inparticular, smoothedratetruncation(SRT)isdiscussed,andaroadmapisprovidedforpreserv- ingstructuralproperties. 6WarrenB.Powell.ApproximateDynamicProgramming:SolvingtheCursesofDimensionality. WileySeriesinProbabilityandStatistics,2011.
Description: