ebook img

Anytime Optimal MDP Planning with Trial-based Heuristic Tree Search PDF

220 Pages·2016·1.43 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Anytime Optimal MDP Planning with Trial-based Heuristic Tree Search

Dissertation zur Erlangung des Doktorgrades der Technischen Fakultät der Albert-Ludwigs-Universität Freiburg im Breisgau Anytime Optimal MDP Planning with Trial-based Heuristic Tree Search Thomas Keller 2015 Dean: Prof. Dr. Georg Lausen, University of Freiburg, Germany PhD advisor and first reviewer: Prof. Dr. Bernhard Nebel, University of Freiburg, Germany Second reviewer: Prof. Dr. Wolfram Burgard, University of Freiburg, Germany Examination committee: Prof. Dr. Bernd Becker (co-chair), University of Freiburg, Germany Prof. Dr. Wolfram Burgard, University of Freiburg, Germany Prof. Dr. Bernhard Nebel, University of Freiburg, Germany Prof. Dr. Matthias Teschner (chair), University of Freiburg, Germany Date of disputation: July 7, 2015 To Catherine & Jonathan. Abstract Planning and acting in a dynamic environment is a challenging task for an autonomous agent, especially in the presence of uncertain and exogenous effects, a large number of states, and a long-term planning horizon. In this thesis, we approach the problem by considering algorithms that interleave planning for the current state and execution of the taken decision. The main challenge of the agent is to use its tight deliberation time wisely. One solution are determinizations, which simplify the Markov Decision Process that describes the uncertain environment to a deterministic planning problem. Weintroduceanall-outcomesdeterminizationwhere,unlikeincom- parablemethods,thenumberofdeterministicactionsisnotexponentiallybut polynomially bounded in the number of parallel probabilistic effects. We dis- cuss three algorithms that base their decision solely on the solution to a de- terminization, and show that they have fundamental limitations that prevent optimal behavior even if provided with unlimited resources. The main contribution of this thesis, the Trial-based Heuristic Tree Search (THTS) framework, allows the description of algorithms in terms of only six ingredients that can be mixed and matched at will. We present a selection of ingredients and analyze theoretically which combinations yield asymptot- ically optimal behavior. Our implementation of the THTS framework, the probabilistic planner PROST, not only allows to evaluate all anytime optimal algorithms empirically on the benchmarks of the International Probabilistic Planning Competition (IPPC), but furthermore emphasizes the potential of THTSbybeingthebacktobackwinnerofthecompetitionin2011and2014. In the final chapter, we introduce the MDP-Evaluation Stopping Problem, the optimization problem faced by participants of IPPC 2014. We show how it can be constructed formally, discuss three special cases that are solvable in practice, and present approximate algorithms that are based on techniques thatarederivedfromthesolutionsforthespecialcases. Finally,weshowthe- oretically and empirically that all proposed algorithms improve significantly over the application of the state-of-the-art approach. iii Zusammenfassung Planen und Handeln in einer dynamischen Umgebung ist eine große Heraus- forderung für einen autonomen Agenten, insbesondere unter Unsicherheit, vielen Zuständen sowie einem langfristigen Planungshorizont. Wir gehen das Problem in dieser Thesis mit Algorithmen an, die abwechselnd für den mo- mentanen Zustand planen und die resultierende Aktion ausführen. Die wich- tigste Herausforderung eines Agenten liegt darin, die begrenzte Zeit zur Ent- scheidungsfindung sinnvoll zu nutzen. Determinisierungen kompilieren den dieunsichereUmgebungmodellierendenMarkov’schenEntscheidungsprozess in ein deterministisches Planungsproblem. Wir präsentieren eine Determini- sierung,inwelcherallemöglichenAusgängeerhaltenbleibenunddieAnzahl der deterministischen Aktionen erstmals nicht exponentiell sondern polyno- miell in der Anzahl paralleler probabilistischer Effekte begrenzt ist. Wir stellen drei Algorithmen vor die ihre Entscheidung ausschließlich auf Basis einer Determinisierung treffen. Allerdings haben diese fundamentale SchwächendiedazuführendasssieselbstmitunbegrenztenRessourcennicht optimal sind. Der Hauptbeitrag dieser Thesis, das Trial-based Heuristic Tree Search (THTS) Framework, erlaubt die Beschreibung von Algorithmen durch sechsZutatenwelchebeliebiggemischtwerdenkönnen.Wirpräsentiereneine AuswahlvonZutatenundanalysierentheoretischwelchezuasymptotischop- timalen Rezepten kombiniert werden können. Unsere Implementierung des THTS Frameworks, der probabilistische Planer PROST, erlaubt nicht nur die Evaluierung aller optimaler Algorithmen auf den Benchmarks des Internatio- nalen Probabilistischen Planungswettbewerbs (IPPC), sondern zeigt auch die Stärkenvon THTSdurch denwiederholten GewinndesIPPC 2011und 2014. ImletztenKapitelbeschreibenwirdasOptimierungsproblemallerTeilneh- merdesIPPC2014,dasMDP-EvaluationStoppingProblem.Wirzeigenwiees formal konstruiert werden kann, diskutieren drei Spezialfälle die auch in der Praxisgelöstwerdenkönnenundpräsentierendaraufbasierende,näherungs- weiseVerfahren.Schließlichzeigenwirtheoretischundempirischdassunsere Algorithmen eine deutliche Verbesserung zum naiven Ansatz darstellen. v Acknowledgments In the time it takes to write a thesis there is a large number of people that contribute to a successful outcome in one way or another, and it is my plea- sure to thank all those wonderful people. First of all, I would like to thank my advisor Bernhard Nebel for offering me the possibility to be part of his researchgroup. Iamgratefulthathegavemethefreedomtopursuemyown ideas while pushing me in the right moments to get the work done. Plenty of amazing opportunities have arisen due to his efforts. Most notably, they allowed me to travel to conferences and work meetings all around the world, which led to valuable input of researchers I met on those trips. He managed to establish a relaxed and productive atmosphere in his research group, and it has always been a pleasure to be part of the group. I also thank all other members from the research group. Roswitha Hilden and Petra Geiger have been the two persons one could turn to with any prob- lem, and the day has yet to come that one remains unsolved. Uli Jakob did notonlykeepthecomputerinfrastructurerunsmoothly(hissupportwiththe grid has been crucial for the empirical part of this thesis), he has also made sure that I never lost my spirit by providing me with numerous headsets and by maintaining the coffee machine. I thank all three of them for everything they did for me. I do not want to miss the opportunity to thank my colleagues Alexan- derKleiner,ChristianBecker-Asano,ChristianDornhege,FlorianGeißer,Gabi Röger, Johannes Aldinger, Johannes Löhr, Michael Brenner, Moritz Göbel- becker, Stefan Wölfl, and Tim Schulte for many scientific and non-scientific discussions that made the research group such a great place to work. There arealsosomeformercolleaguesthatIcollaboratedespeciallycloselywith. Se- bastianKupferschmidsupervisedmyStudienarbeit,whichwasacrucialpoint during my studies as it sparked my enthusiasm for AI, and I still benefit from hisadviceongoodcodinghabits. RobertMattmülleralwaysmademeremem- ber that, while writing comprehensible papers is an important aim, formal correctness must not suffer from it. vii viii IwasfortunateenoughtohaveMalteHelmertasacolleague. Hewillingly shared his knowledge and experience with me in numerous discussions that werekeytomanyofmypublications. HavingMalteasamentorhassimplified my research significantly, and I owe him my deepest gratitude. And finally, I would like to single out Patrick Eyerich. I remember countless late nights working side by side with him, trying to make some robot clean up a table or pick up a cup, our planning system produce reasonable policies, or finish up a paper for a deadline that was approaching way faster than what we anticipated. Inodoubthavegreatlybenefitedfromourcollaborationandhave grown both professionally and personally. Most importantly, it has always been a great pleasure to work with Patrick, and I am truly grateful for that. I would also like to thank Florian Geißer, Patrick Eyerich, and Robert Mattmüller for proofreading this thesis very carefully. Their comments have been of extremely high value. Luckily,lifeisnotalwaysaboutAI,andIwouldliketothankallthepeople who enrich my life in those moments where I am not (primarily) a computer scientist. Spendingtimewithmyfriends,especiallywithAlexanderKirschner, AndreasRau,DanielKurreck,FlorianMutschelknaus,HolleBergmann,Jasper Kittel, Regina Kurreck, Tilman Schieber, and Toffer Risch, has often given me much needed distraction, joy, and happiness. The last couple of months prior to finishing this thesis have been special in several ways, but most importantly due to the birth of my son Jonathan. Becoming a father has influenced me like nothing before, and it has made my life so much more joyful. However, it also meant additional duties that complicated the process of writing significantly. Without the incredible sup- portofmyparents-in-law,WiebkeandFranz,itwouldnothavebeenpossible to finish this thesis. The time I spent with my brother-in-law Philip and his girlfriend Sabine has always been a refreshing experience that allowed me to motivate myself for whatever challenge lay ahead. My father Wolfgang, my sister Sabrina with her husband Oli and her daughter Emma, and my aunt Hermine and her family have always provided mewithunconditionalsupportandlove,andIamgratefultohavesuchwon- derful persons in my life. I have saved the last word of acknowledgment for mywifeCatherine. Duetothesacrificesyouwerewillingtoaccept,especially in the last couple of months, your merits concerning this thesis cannot be put in words. Thank you for your love, support, patience, and encouragement, and for sharing the best moments of life with me!

Description:
The example we use in this chapter is the CANADIAN TRAVELER'S PROB-. LEM (CTP), a path Namesake of the CANADIAN TRAVELER'S PROBLEM is a scenario with the objective to drive a truck Monte-Carlo Tree Search algorithms in planning under uncertainty or of our. THTS-based PROST
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.