ebook img

Planning with Markov Decision Processes. An AI Perspective PDF

197 Pages·2012·1.23 MB·English
by  Mausam
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Planning with Markov Decision Processes. An AI Perspective

Planning with Markov Decision Processes AnAIPerspective Mausam and Andrey Kolobov UniversityofWashington SYNTHESISLECTURESONARTIFICIALINTELLIGENCE ANDMACHINE LEARNING#17 M &C Morgan &cLaypool publishers Copyright© 2012byMorgan&Claypool PlanningwithMarkovDecisionProcesses:AnAIPerspective MausamandAndreyKolobov www.morganclaypool.com ISBN:9781608458868 paperback ISBN:9781608458875 ebook DOI10.2200/S00426ED1V01Y201206AIM017 APublicationintheMorgan&ClaypoolPublishersseries SYNTHESISLECTURESONARTIFICIALINTELLIGENCE ANDMACHINELEARNING Lecture#17 SeriesEditors:RonaldJ.Brachman,YahooResearch WilliamW.Cohen,CarnegieMellonUniversity ThomasDietterich,OregonStateUniversity SeriesISSN SynthesisLecturesonArtificialIntelligenceandMachineLearning Print1939-4608 Electronic1939-4616 ABSTRACT Markov Decision Processes (MDPs) are widely popular in Artificial Intelligence for modeling se- quentialdecision-makingscenarioswithprobabilisticdynamics.Theyaretheframeworkofchoice when designing an intelligent agent that needs to act for long periods of time in an environment whereitsactionscouldhaveuncertainoutcomes.MDPsareactivelyresearchedintworelatedsubar- easofAI,probabilisticplanningandreinforcementlearning.Probabilisticplanningassumesknown modelsfortheagent’sgoalsanddomaindynamics,andfocusesondetermininghowtheagentshould behavetoachieveitsobjectives.Ontheotherhand,reinforcementlearningadditionallylearnsthese modelsbasedonthefeedbacktheagentgetsfromtheenvironment. ThisbookprovidesaconciseintroductiontotheuseofMDPsforsolvingprobabilisticplan- ningproblems,withanemphasisonthealgorithmicperspective.Itcoversthewholespectrumofthe field,from the basics to state-of-the-art optimal and approximation algorithms.We first describe thetheoreticalfoundationsofMDPsandthefundamentalsolutiontechniquesforthem.Wethen discussmodernoptimalalgorithmsbasedonheuristicsearchandtheuseofstructuredrepresenta- tions.A major focus of the book is on the numerous approximation schemes for MDPs that have been developed in the AI literature. These include determinization-based approaches, sampling techniques,heuristicfunctions,dimensionalityreduction,andhierarchicalrepresentations.Finally, webrieflyintroduceseveralextensionsofthestandardMDPclassesthatmodelandsolveevenmore complexplanningproblems. KEYWORDS MDP,AIplanning,probabilisticplanning,uncertaintyinAI,sequentialdecisionmak- ingunderuncertainty,reinforcementlearning Contents Preface ..................................................................xv 1 Introduction ..............................................................1 1.1 CharacteristicsofanMDP............................................... 1 1.2 ConnectionswithDifferentFields ........................................ 3 1.3 OverviewofthisBook................................................... 4 2 MDPs....................................................................7 2.1 MarkovDecisionProcesses:Definition .................................... 7 2.2 SolutionsofanMDP ................................................... 9 2.3 SolutionExistence ..................................................... 12 2.3.1 ExpectedLinearAdditiveUtilityandtheOptimalityPrinciple ......... 13 2.3.2 Finite-HorizonMDPs ........................................... 14 2.3.3 Infinite-HorizonDiscounted-RewardMDPs........................ 16 2.3.4 Indefinite-HorizonMDPs........................................ 18 2.4 StochasticShortest-PathMDPs ......................................... 18 2.4.1 Definition ...................................................... 18 2.4.2 StochasticShortest-PathMDPsandOtherMDPClasses ............. 22 2.5 FactoredMDPs ....................................................... 23 2.5.1 FactoredStochasticShortest-PathMDPs ........................... 23 2.5.2 PPDDL-styleRepresentation ..................................... 24 2.5.3 RDDL-styleRepresentation ...................................... 26 2.5.4 FactoredRepresentationsandSolvingMDPs........................ 27 2.6 ComplexityofSolvingMDPs ........................................... 27 3 FundamentalAlgorithms................................................. 31 3.1 ABrute-ForceAlgorithm............................................... 32 3.2 PolicyEvaluation ...................................................... 32 3.2.1 PolicyEvaluationbySolvingaSystemofEquations .................. 34 3.2.2 AnIterativeApproachtoPolicyEvaluation ......................... 34 3.3 PolicyIteration........................................................ 36 3.3.1 ModifiedPolicyIteration ......................................... 38 3.3.2 LimitationsofPolicyIteration..................................... 38 3.4 ValueIteration ........................................................ 38 3.4.1 Bellmanequations ............................................... 39 3.4.2 TheValueIterationAlgorithm .................................... 40 3.4.3 TheoreticalProperties............................................ 42 3.4.4 AsynchronousValueIteration ..................................... 44 3.5 PrioritizationinValueIteration .......................................... 44 3.5.1 PrioritizedSweeping............................................. 46 3.5.2 ImprovedPrioritizedSweeping .................................... 47 3.5.3 FocusedDynamicProgramming ................................... 48 3.5.4 BackwardValueIteration ......................................... 48 3.5.5 AComparisonofPrioritizationAlgorithms ......................... 48 3.6 PartitionedValueIteration .............................................. 50 3.6.1 TopologicalValueIteration........................................ 51 3.6.2 ExternalMemory/CacheEfficientAlgorithms....................... 52 3.6.3 ParallelizationofValueIteration ................................... 53 3.7 LinearProgrammingFormulation........................................ 53 3.8 Infinite-HorizonDiscounted-RewardMDPs .............................. 54 3.8.1 Bellmanequations ............................................... 54 3.8.2 Value/PolicyIteration ............................................ 54 3.8.3 PrioritizedandPartitionedAlgorithms ............................. 55 3.9 Finite-HorizonMDPs ................................................. 55 3.10 MDPswithDeadEnds................................................. 56 3.10.1 Finite-PenaltySSPMDPswithDead-Ends......................... 57 4 HeuristicSearchAlgorithms.............................................. 59 4.1 HeuristicSearchandSSPMDPs ........................................ 59 4.2 FIND-and-REVISE:aSchemaforHeuristicSearch ....................... 61 ∗ 4.3 LAO andExtensions.................................................. 64 ∗ 4.3.1 LAO ......................................................... 65 ∗ 4.3.2 ILAO ......................................................... 67 ∗ ∗ 4.3.3 BLAO andRLAO :ExpandingtheReverseEnvelope............... 67 ∗ 4.3.4 AO :HeuristicSearchforAcyclicMDPs ........................... 68 4.4 RTDPandExtensions ................................................. 69 4.4.1 RTDP ......................................................... 69 4.4.2 LRTDP........................................................ 70 4.4.3 BRTDP,FRTDP,VPI-RTDP:AddinganUpperBound.............. 71 4.5 HeuristicsandTransitionGraphPruning ................................. 75 4.5.1 ActionElimination .............................................. 75 4.5.2 FocusedTopologicalValueIteration ................................ 76 4.6 ComputingAdmissibleHeuristics ....................................... 76 4.6.1 AdaptingClassicalPlanningHeuristicstoMDPs .................... 77 4.6.2 Thehaodet Heuristic ............................................. 77 4.6.3 Thehmax Heuristic .............................................. 78 4.7 HeuristicSearchandDeadEnds......................................... 80 4.7.1 TheCaseofAvoidableDeadEnds ................................. 80 4.7.2 TheCaseofUnavoidableDeadEnds............................... 82 5 SymbolicAlgorithms .................................................... 83 5.1 AlgebraicDecisionDiagrams............................................ 84 5.1.1 TheREDUCEOperator ......................................... 85 5.1.2 TheAPPLYOperator............................................ 85 5.1.3 OtherADDOperators........................................... 87 5.2 SPUDD:ValueIterationusingADDs .................................... 88 5.3 SymbolicLAO* ....................................................... 90 5.4 OtherSymbolicAlgorithms............................................. 91 5.5 OtherSymbolicRepresentations ......................................... 93 5.6 ApproximationsusingSymbolicApproaches............................... 95 6 ApproximationAlgorithms ............................................... 97 6.1 Determinization-basedTechniques....................................... 99 6.1.1 FF-Replan .................................................... 100 6.1.2 FF-Hindsight.................................................. 102 6.1.3 RFF .......................................................... 103 6.1.4 HMDPP...................................................... 106 6.1.5 Determinization-basedApproximationsandDeadEnds ............. 108 6.2 Sampling-basedTechniques ............................................ 110 6.2.1 UCT ......................................................... 111 6.3 HeuristicSearchwithInadmissibleHeuristics ............................ 114 6.3.1 hadd .......................................................... 114 6.3.2 hFF .......................................................... 115 6.3.3 hGOTH ....................................................... 116 6.4 DimensionalityReduction-basedTechniques ............................. 118 6.4.1 ReTrASE ..................................................... 119 6.4.2 ApproximatePolicyIterationandLinearProgramming .............. 121 6.4.3 FPG.......................................................... 124 6.5 HierarchicalPlanning ................................................. 126 6.5.1 Options ....................................................... 127 6.5.2 TaskHierarchy................................................. 130 6.5.3 HierarchyofAbstractMachines .................................. 133 6.5.4 OtherApproaches .............................................. 135 6.5.5 StateAbstractioninHierarchicalMDP............................ 136 6.5.6 LearningHierarchicalKnowledge ................................ 137 6.5.7 Discussion..................................................... 138 6.6 HybridizedPlanning.................................................. 139 6.7 AComparisonofDifferentAlgorithms .................................. 141 7 AdvancedNotes........................................................ 143 7.1 MDPswithContinuousorHybridStates ................................ 143 7.1.1 ValueFunctionRepresentations .................................. 144 7.1.2 HeuristicSearchforHybridMDPs ............................... 146 7.1.3 ContinuousActions............................................. 147 7.2 MDPwithConcurrencyandDurativeActions............................ 147 7.2.1 DurativeActions ............................................... 148 7.2.2 ConcurrentActions............................................. 148 7.2.3 Concurrent,DurativeActions .................................... 149 7.3 RelationalMDPs ..................................................... 151 7.3.1 SolutionRepresentations ........................................ 151 7.3.2 Algorithms .................................................... 152 7.4 GeneralizedStochasticShortestPathMDPs ............................. 152 7.4.1 MathematicalProperties......................................... 155 7.4.2 AlgorithmsforGSSPMDPs..................................... 155 7.4.3 SixthSense:AHeuristicforIdentifyingDeadEnds.................. 156 7.5 OtherModels........................................................ 157 7.6 IssuesinProbabilisticPlanning ......................................... 159 7.6.1 TheImportanceofPlanningCompetitions......................... 159 7.6.2 TheBaneofManyConferences .................................. 160 7.7 Summary............................................................ 161 Bibliography........................................................... 163 Preface Startinginthe1950s,overtime,anumberofbookshavebeenwrittenonMarkovDecisionProcesses indifferentfieldsofresearch.However,mostbookshavetakenarathertheoreticalviewofthetopic –delvingdeepintothefundamentalinsights,elaboratingontheunderlyingmathematicalprinciples andprovingeachdetailtogivetheframeworkthetheoreticalconsistencythatisexpectedofsucha model. Comparatively,there is less synthesized literature available on the use of MDPs within AI. Whilethereinforcementlearningperspectivehasbeenpublishedinacoupleofbooksandsurveys, thelackofsurveysisespeciallyglaringfromtheprobabilisticplanningpointofview. Our book differs from the existing literature on MDPs in other fields with its emphasis on thealgorithmictechniques.Westartwiththefundamentalalgorithms,butgofarbeyondthemand survey the multitude of approaches proposed by AI researchers in scaling the solution algorithms to larger problems.Wherever necessary,we do present the theoretical results,but in line with our focus,weavoidtheproofsandpointtootherliteratureforanin-depthanalysis. Wemakenoassumptionsaboutthereader’spriorknowledgeofMDPs,andexpectthisbook tobeofvaluetoabeginningstudentinterestedinlearningaboutMDPs,toamid-levelresearcher whowishestogetanoverviewoftheMDPsolutiontechniques,aswellastoaseasonedresearcher interestedinthereferencestoadvancedtechniquesfromwhereshecanlaunchfurtherstudy. Ourbookcomprisessevenchapters.WeprovideageneralintroductiontothebookandMDPs inthefirstchapter.Chapter2definestherepresentationofanMDPmodel.Thereisanimportant issue here.Various researchers have studied slightly different problem definitions of an MDP and notallresultsandalgorithmsapplytoallversions.Tokeepthebookcoherent,wechosethestochastic shortest path (SSP) formalism as our base MDP.This generalizes several common MDP models, but,someotheradvancedmodelsareoutofitspurview. Afterthemodeldefinition,wefocusthenextthreechaptersontheoptimalsolutionalgorithms. Chapter3startsfromthefundamentalalgorithmsfromthe1950sandleadsuptoavarietyofrecent optimizationsthatscalethem.HeuristicsearchideasfromtheAIliteratureareincorporatedontop ofthesealgorithmsinChapter4.Theseideasareusefulwhenaspecificstartstateisknowntous. Chapter5describestheuseofcompactvaluefunctionrepresentationsinoptimalMDPalgorithms. AnoptimalsolutiontoanMDPisaluxury.Mostrealproblemsaresolargethatcomputing optimalsolutionsisinfeasible.AsignificantemphasisofourbookisChapter6,whichdiscussesthe state of the art in approximately solving these MDPs.Researchers have proposed a wide range of algorithms,whichspanvariouspointsontheefficiency-optimalityspectrum.Thischaptersurveys thesealgorithms. xvi PREFACE Finally,Chapter7brieflydiscussesseveralmodelsthatrelaxthevariousassumptionsimplicit in an MDP.Continuous and hybrid state spaces,concurrent,durative actions,and generalizations ofSSPsaresomeoftheMDPmodelsthatwediscussinthelastchapter. Wewouldliketoacknowledgeseveralcolleaguesandresearcherswhogaveususefulfeedback onearlierdraftsofthebookorhelpeduswithspecificconcepts.TheseincludeChrisLin(University ofWashington),DanWeld(UniversityofWashington),DimitriBertsekas(MassachusettsInstitute ofTechnology),Eric Hansen (Mississippi State University),Hector Geffner (Universitat Pompeu Fabra), Martine De Cock (Ghent University), Peng Dai (Google), Scott Sanner (National ICT Australia Ltd), Florent Teichteil-Königsbuch (ONERA), and the anonymous reviewer. We also thankTom Dietterich (Oregon State University) for his enthusiasm on the initial idea and Mike Morganforhisconstanthelpwhilewritingthebook. The writing of this book has been supported by the NSF grant IIS-1016465, ONR grant N000140910051,andTuringCenterattheUniversityofWashington.Anyopinionsorconclusions expressedinthetextarethoseoftheauthorsanddonotnecessarilyreflecttheviewsofthefunding agencies. MausamandAndreyKolobov June2012 C H A P T E R 1 Introduction Thevisionofartificialintelligenceisoftenmanifestedthroughanautonomousagentinacomplex and uncertain environment.The agent is capable of thinking ahead and acting for long periods of time in accordance with its goal/objective. Such agents appear in a broad set of applications, for example, the Mars rover planning its daily schedule of activities [166], planning of military operations[2],robocupsoccer[223],anagentplayinggameslikeblackjack[195],asetofelevators operatinginsync[57],andinterventionofcellularprocesses[47]. The AI sub-field of Automated Planning under Uncertainty tackles several core problems in the design of such an agent.These planning problems are typically formulated as an instance of a MarkovDecisionProcessoranMDP.Atthehighestlevel,anMDPcomprisesasetofworldstates,a setofactionsundertheagent’scontrol,atransitionmodeldescribingtheprobabilityoftransitioning toanewstatewhentakinganactioninthecurrentstate,andanobjectivefunction,e.g.,maximizing thesumofrewardsobtainedoverasequenceoftimesteps.AnMDPsolutiondeterminestheagent’s actionsateachdecisionpoint.AnoptimalMDPsolutionisonethatoptimizestheobjectivefunction. TheMDPmodelwaspopularizedintheoperationsresearchliteraturewiththeearlyworks of Bellman and Howard in 1950s [18; 114]. For the first thirty or so years after this, there was significant progress in building the theory of MDPs and a basic set of algorithms to solve them [83].The AI community adopted the model in the early 1990s,with the earliest works exploring connections to the popular, classical planning paradigm (which ignores the uncertainty in action outcomes)[36;70;74;129]. Since then, MDPs have been immensely popular in AI in primarily two related sub- communities – probabilistic planning and reinforcement learning.The probabilistic planning lit- eratureassumescompletepriorknowledgeoftheMDPmodelandfocusesondevelopingcompu- tationally efficient approaches to solve it. On the other hand, reinforcement learning studies the harder problem in which the agent does not have prior access to the complete model and,hence also has to learn (parts of) it based on its experience.In this book,we survey the state-of-the-art techniquesdevelopedintheprobabilisticplanningliterature. 1.1 CHARACTERISTICSOFANMDP WhatkindsofdomainsarebestmodeledasanMDP?Weanswerthequestionbelowandusetwo scenariosasourrunningexamples. The first example is of an agent playing the game of blackjack.The goal of the game is to acquireplayingcardssuchthatthesumofthescoresishigherthanthedealer’scards,butnotover

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.