ebook img

A Concise Introduction to Decentralized POMDPs PDF

146 Pages·2016·2.316 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview A Concise Introduction to Decentralized POMDPs

SPRINGER BRIEFS IN INTELLIGENT SYSTEMS ARTIFICIAL INTELLIGENCE, MULTIAGENT SYSTEMS, AND COGNITIVE ROBOTICS Frans A. Oliehoek Christopher Amato A Concise Introduction to Decentralized POMDPs 123 SpringerBriefs in Intelligent Systems fi Arti cial Intelligence, Multiagent Systems, and Cognitive Robotics Series editors Gerhard Weiss, Maastricht University, Maastricht, The Netherlands Karl Tuyls, University of Liverpool, Liverpool, UK Editorial Board Felix Brandt, Technische Universität München, Munich, Germany Wolfram Burgard, Albert-Ludwigs-Universität Freiburg, Freiburg, Germany Marco Dorigo, Université libre de Bruxelles, Brussels, Belgium Peter Flach, University of Bristol, Bristol, UK Brian Gerkey, Open Source Robotics Foundation, Bristol, UK Nicholas R. Jennings, Southampton University, Southampton, UK Michael Luck, King’s College London, London, UK Simon Parsons, City University of New York, New York, US Henri Prade, IRIT, Toulouse, France Jeffrey S. Rosenschein, Hebrew University of Jerusalem, Jerusalem, Israel Francesca Rossi, University of Padova, Padua, Italy Carles Sierra, IIIA-CSIC Cerdanyola, Barcelona, Spain Milind Tambe, USC, Los Angeles, US Makoto Yokoo, Kyushu University, Fukuoka, Japan This series covers the entire research and application spectrum of intelligent systems, including artificial intelligence, multiagent systems, and cognitive robotics. Typical texts for publication in the series include, but are not limited to, state-of-the-art reviews, tutorials, summaries, introductions, surveys, and in-depth case and application studies of established or emerging fields and topics in the realm of computational intelligent systems. Essays exploring philosophical and societal issues raised by intelligent systems are also very welcome. More information about this series at http://www.springer.com/series/11845 Frans A. Oliehoek Christopher Amato (cid:129) A Concise Introduction to Decentralized POMDPs 123 FransA.Oliehoek Christopher Amato SchoolofElectricalEngineering,Electronics ComputerScienceandArtificialIntelligence andComputer Science Laboratory University of Liverpool MIT Liverpool Cambridge, MA UK USA ISSN 2196-548X ISSN 2196-5498 (electronic) SpringerBriefs inIntelligent Systems ISBN978-3-319-28927-4 ISBN978-3-319-28929-8 (eBook) DOI 10.1007/978-3-319-28929-8 LibraryofCongressControlNumber:2016941071 ©TheAuthor(s)2016 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpart of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission orinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar methodologynowknownorhereafterdeveloped. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfrom therelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authorsortheeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinor foranyerrorsoromissionsthatmayhavebeenmade. Printedonacid-freepaper ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerInternationalPublishingAGSwitzerland Dedicatedtofuturegenerationsofintelligent decisionmakers. Preface Thisbookpresentsanoverviewofformaldecisionmakingmethodsfordecentral- ized cooperative systems. It is aimed at graduate students and researchers in the fieldsofartificialintelligenceandrelatedfieldsthatdealwithdecisionmaking,such asoperationsresearchandcontroltheory.Whilewehavetriedtomakethebookrel- ativelyself-contained,wedoassumesomeamountofbackgroundknowledge. Inparticular,weassumethatthereaderisfamiliarwiththeconceptofanagentas wellassearchtechniques(likedepth-firstsearch,A*,etc.),bothofwhicharestan- dardinthefieldofartificialintelligence[RussellandNorvig,2009].Additionally, we assume that the reader has a basic background in probability theory. Although wegiveaveryconcisebackgroundinrelevantsingle-agentmodels(i.e.,the‘MDP’ and ‘POMDP’ frameworks), a more thorough understanding of those frameworks would benefit the reader. A good first introduction to these concepts can be found inthetextbookbyRussellandNorvig,withadditionaldetailsintextsbySuttonand Barto[1998],Kaelblingetal.[1998],Spaan[2012]andKochenderferetal.[2015]. We alsoassume thatthe readerhas abasic background ingame theoryand game- theoreticnotationslikeNashequilibriumandParetoefficiency.Eventhoughthese conceptsarenotcentraltoourexposition,wedoplacetheDec-POMDPmodelin themoregeneralcontexttheyoffer.Foranexplanationoftheseconcepts,thereader could refer to any introduction on game theory, such as those by Binmore [1992], OsborneandRubinstein[1994]andLeyton-BrownandShoham[2008]. This book heavily builds upon earlier texts by the authors. In particular, many partswerebasedontheauthors’previoustheses,bookchaptersandsurveyarticles [Oliehoek, 2010, 2012, Amato, 2010, 2015, Amato et al., 2013]. This also means that,eventhoughwehavetriedtogivearelativelycompleteoverviewofthework inthefield,thetextinsomecasesisbiasedtowardsexamplesandmethodsthathave been considered by the authors. For the description of further topics in Chapter 8, we have selected those that we consider important and promising for future work. Clearly, there is a necessarily large overlap between these topics and the authors’ recentworkinthefield. vii Acknowledgments Writing a book is not a standalone activity; it builds upon all the insights devel- opedintheinteractionswithpeers,reviewersandcoathors.Assuch,wearegrateful fortheinteractionwehavehadwiththeentireresearchfield.Wespecificallywant to thank the attendees and organizers of the workshops on multiagent sequential decisionmaking(MSDM)whichhaveprovidedauniqueplatformforexchangeof thoughtsondecisionmakingunderuncertainty. Furthermore, we would like to thank João Messias, Matthijs Spaan, Shimon Whiteson,andStefanWitwickifortheirfeedbackonsectionsofthemanuscript. Finally,wearegratefultoourformersupervisors,inparticularNikosVlassisand ShlomoZilberstein,whoenabledandstimulatedustogodownthepathofresearch ondecentralizeddecisionmaking. ix Contents 1 MultiagentSystemsUnderUncertainty .......................... 1 1.1 MotivatingExamples ....................................... 2 1.2 MultiagentSystems......................................... 4 1.3 Uncertainty................................................ 6 1.4 Applications .............................................. 7 2 TheDecentralizedPOMDPFramework .......................... 11 2.1 Single-AgentDecisionFrameworks ........................... 11 2.1.1 MDPs.............................................. 12 2.1.2 POMDPs ........................................... 13 2.2 MultiagentDecisionMaking:DecentralizedPOMDPs............ 14 2.3 ExampleDomains.......................................... 17 2.3.1 Dec-Tiger .......................................... 17 2.3.2 MultirobotCoordination:RecyclingandBox-Pushing ..... 19 2.3.3 NetworkProtocolOptimization ........................ 20 2.3.4 EfficientSensorNetworks............................. 20 2.4 SpecialCases,GeneralizationsandRelatedModels ............. 21 2.4.1 ObservabilityandDec-MDPs .......................... 21 2.4.2 FactoredModels..................................... 22 2.4.3 CentralizedModels:MMDPsandMPOMDPs............ 24 2.4.4 MultiagentDecisionProblems ......................... 25 2.4.5 PartiallyObservableStochasticGames .................. 30 2.4.6 InteractivePOMDPs ................................. 30 3 Finite-HorizonDec-POMDPs ................................... 33 3.1 OptimalityCriteria ......................................... 33 3.2 PolicyRepresentations:HistoriesandPolicies .................. 34 3.2.1 Histories ........................................... 34 3.2.2 Policies ............................................ 35 3.3 MultiagentBeliefs.......................................... 37 3.4 ValueFunctionsforJointPolicies ............................. 37 xi xii Contents 3.5 Complexity................................................ 39 4 ExactFinite-HorizonPlanningMethods .......................... 41 4.1 BackwardsApproach:DynamicProgramming .................. 41 4.1.1 GrowingPoliciesfromSubtreePolicies ................. 41 4.1.2 DynamicProgrammingforDec-POMDPs ............... 44 4.2 ForwardApproach:HeuristicSearch .......................... 46 4.2.1 TemporalStructureinPolicies:DecisionRules ........... 46 4.2.2 MultiagentA* ...................................... 47 4.3 ConvertingtoaNon-observableMDP ......................... 48 4.3.1 ThePlan-TimeMDPandOptimalValueFunction......... 49 4.3.2 Plan-TimeSufficientStatistics ......................... 49 4.3.3 AnNOMDPFormulation ............................. 51 4.4 OtherFinite-HorizonMethods................................ 52 4.4.1 Point-BasedDP ..................................... 52 4.4.2 Optimization ....................................... 52 5 ApproximateandHeuristicFinite-HorizonPlanningMethods ...... 55 5.1 ApproximationMethods..................................... 56 5.1.1 BoundedDynamicProgramming ....................... 56 5.1.2 EarlyStoppingofHeuristicSearch ..................... 57 5.1.3 ApplicationofPOMDPApproximationAlgorithms ....... 57 5.2 HeuristicMethods.......................................... 58 5.2.1 AlternatingMaximization ............................. 58 5.2.2 Memory-BoundedDynamicProgramming .............. 59 5.2.3 ApproximateHeuristic-SearchMethods ................. 61 5.2.4 EvolutionaryMethodsandCross-EntropyOptimization .... 64 6 Infinite-HorizonDec-POMDPs .................................. 69 6.1 OptimalityCriteria ......................................... 69 6.1.1 DiscountedCumulativeReward ........................ 69 6.1.2 AverageReward ..................................... 70 6.2 PolicyRepresentation....................................... 70 6.2.1 Finite-StateControllers:MooreandMealy............... 71 6.2.2 AnExampleSolutionforDEC-TIGER .................. 73 6.2.3 Randomization ...................................... 74 6.2.4 CorrelationDevices .................................. 74 6.3 ValueFunctionsforJointPolicies ............................. 75 6.4 Undecidability,AlternativeGoalsandTheirComplexity ......... 76 7 Infinite-HorizonPlanningMethods:DiscountedCumulativeReward 79 7.1 PolicyIteration ............................................ 79 7.2 OptimizingFixed-SizeControllers ............................ 81 7.2.1 Best-FirstSearch .................................... 82 7.2.2 BoundedPolicyIteration ............................. 83 7.2.3 NonlinearProgramming .............................. 85

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.