ebook img

optimization-based approximate dynamic programming PDF

259 Pages·2010·3.12 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview optimization-based approximate dynamic programming

OPTIMIZATION-BASED APPROXIMATE DYNAMIC PROGRAMMING A Dissertation Presented by MAREK PETRIK Submitted to the Graduate School of the University of Massachusetts Amherst in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY September 2010 Department of Computer Science c Copyright by Marek Petrik 2010 (cid:13) All Rights Reserved OPTIMIZATION-BASED APPROXIMATE DYNAMIC PROGRAMMING A Dissertation Presented by MAREK PETRIK Approved as to style and content by: Shlomo Zilberstein, Chair Andrew Barto, Member Sridhar Mahadevan, Member Ana Muriel, Member Ronald Parr, Member Andrew Barto, Department Chair Department of Computer Science To my parents Fedor and Mariana ACKNOWLEDGMENTS I want to thank the people who made my stay at UMass not only productive, but also very enjoyable. I am grateful to my advisor, Shlomo Zilberstein, for guiding and supporting me throughout the completion of this work. Shlomo’s thoughtful advice and probing questions greatly influenced both my thinking and research. His advice was essential not only in forming and refining many of the ideas described in this work, but also in assuring that I become a productive member of the research community. I hope that, one day, I will be able to become an advisor who is just as helpful and dedicated as he is. The members of my dissertation committee were indispensable in forming and steering the topic of this dissertation. The class I took with Andrew Barto motivated me to probe the foundations of reinforcement learning, which became one of the foundations of this thesis. Sridhar Mahadevan’s exciting work on representation discovery led me to deepen my un- derstanding and appreciate better approximate dynamic programming. I really appreciate the detailed comments and encouragement that Ron Parr provided on my research and thesis drafts. Ana Muriel helped me to better understand the connections between my re- search and applications in operations research. Coauthoring papers with Jeff Johns, Bruno Scherrer, and Gavin Taylor was a very stimulating and learning experience. My research was also influenced by interactions with many other researches. The conversations with Raghav Aras, Warren Powell, Scott Sanner, and Csaba Szepesvari were especially illumi- nating. This work was also supported by generous funding from the Air Force Office of Scientific Research. Conversations with my lab mate Hala Mostafa made the long hours in the lab much more enjoyable. Whileourconversationsoftendidnotinvolveresearch, thosethatdid, motivated me to think deeper about the foundations of my work. I also found sharing ideas with my fellow grad students Martin Allen, Chris Amato, Alan Carlin, Phil Kirlin, Akshat Kumar, Sven Seuken, Siddharth Srivastava, and Feng Wu helpful in understanding the broader research topics. My free time at UMass kept me sane thanks to many great friends that I found here. Finally and most importantly, I want thank my family. They were supportive and helpful throughout the long years of my education. My mom’s loving kindness and my dad’s intense fascination with the world were especially important in forming my interests and work habits. My wife Jana has been an incredible source of support and motivation in both research and private life; her companionship made it all worthwhile. It was a great journey. v ABSTRACT OPTIMIZATION-BASED APPROXIMATE DYNAMIC PROGRAMMING SEPTEMBER 2010 MAREK PETRIK Mgr., UNIVERZITA KOMENSKEHO, BRATISLAVA, SLOVAKIA M.Sc., UNIVERSITY OF MASSACHUSETTS AMHERST Ph.D., UNIVERSITY OF MASSACHUSETTS AMHERST Directed by: Professor Shlomo Zilberstein Reinforcementlearningalgorithmsholdpromiseinmanycomplexdomains,suchasresource management and planning under uncertainty. Most reinforcement learning algorithms are iterative — they successively approximate the solution based on a set of samples and fea- tures. Although these iterative algorithms can achieve impressive results in some domains, they are not sufficiently reliable for wide applicability; they often require extensive param- eter tweaking to work well and provide only weak guarantees of solution quality. Some of the most interesting reinforcement learning algorithms are based on approximate dynamic programming (ADP). ADP, also known as value function approximation, approxi- matesthevalueofbeingineachstate. ThisthesispresentsnewreliablealgorithmsforADP that use optimization instead of iterative improvement. Because these optimization–based algorithms explicitly seek solutions with favorable properties, they are easy to analyze, of- fer much stronger guarantees than iterative algorithms, and have few or no parameters to tweak. Inparticular,weimproveonapproximatelinearprogramming—anexistingmethod — and derive approximate bilinear programming — a new robust approximate method. The strong guarantees of optimization–based algorithms not only increase confidence in the solution quality, but also make it easier to combine the algorithms with other ADP com- ponents. The other components of ADP are samples and features used to approximate the valuefunction. Relyingonthesimplifiedanalysisofoptimization–basedmethods, wederive new bounds on the error due to missing samples. These bounds are simpler, tighter, and more practical than the existing bounds for iterative algorithms and can be used to evalu- ate solution quality in practical settings. Finally, we propose homotopy methods that use vi the sampling bounds to automatically select good approximation features for optimization– based algorithms. Automatic feature selection significantly increases the flexibility and applicability of the proposed ADP methods. The methods presented in this thesis can potentially be used in many practical applications in artificial intelligence, operations research, and engineering. Our experimental results show that optimization–based methods may perform well on resource-management prob- lems and standard benchmark problems and therefore represent an attractive alternative to traditional iterative methods. vii CONTENTS Page ACKNOWLEDGMENTS................................................... v ABSTRACT................................................................ vi LIST OF FIGURES ....................................................... xii CHAPTER 1. INTRODUCTION........................................................1 1.1 Planning Models .......................................................2 1.2 Challenges and Contributions ............................................3 1.3 Outline ...............................................................6 PART I: FORMULATIONS 2. FRAMEWORK: APPROXIMATE DYNAMIC PROGRAMMING .......9 2.1 Framework and Notation ................................................9 2.2 Model: Markov Decision Process ........................................10 2.3 Value Functions and Policies............................................12 2.4 Approximately Solving Markov Decision Processes.........................16 2.5 Approximation Error: Online and Offline.................................22 2.6 Contributions .........................................................24 3. ITERATIVE VALUE FUNCTION APPROXIMATION ................ 25 3.1 Basic Algorithms ......................................................25 3.2 Bounds on Approximation Error ........................................28 3.3 Monotonous Approximation: Achieving Convergence.......................30 3.4 Contributions .........................................................31 viii 4. APPROXIMATE LINEAR PROGRAMMING: TRACTABLE BUT LOOSE APPROXIMATION......................................... 33 4.1 Formulation ..........................................................33 4.2 Sample-based Formulation..............................................36 4.3 Offline Error Bounds...................................................38 4.4 Practical Performance and Lower Bounds ................................39 4.5 Expanding Constraints.................................................42 4.6 Relaxing Constraints...................................................45 4.7 Empirical Evaluation ..................................................49 4.8 Discussion............................................................50 4.9 Contributions .........................................................51 5. APPROXIMATE BILINEAR PROGRAMMING: TIGHT APPROXIMATION ................................................. 52 5.1 Bilinear Program Formulations..........................................52 5.2 Sampling Guarantees ..................................................59 5.3 Solving Bilinear Programs ..............................................60 5.4 Discussion and Related ADP Methods ...................................61 5.5 Empirical Evaluation ..................................................65 5.6 Contributions .........................................................67 PART II: ALGORITHMS 6. HOMOTOPY CONTINUATION METHOD FOR APPROXIMATE LINEAR PROGRAMS .............................................. 69 6.1 Homotopy Algorithm ..................................................69 6.2 Penalty-based Homotopy Algorithm .....................................73 6.3 Efficient Implementation ...............................................76 6.4 Empirical Evaluation ..................................................78 6.5 Discussion and Related Work ...........................................79 6.6 Contributions .........................................................81 7. SOLVING APPROXIMATE BILINEAR PROGRAMS ................. 82 7.1 Solution Approaches ...................................................82 7.2 General Mixed Integer Linear Program Formulation .......................83 7.3 ABP-Specific Mixed Integer Linear Program Formulation...................85 7.4 Homotopy Methods....................................................87 ix 7.5 Contributions .........................................................89 8. SOLVING SMALL-DIMENSIONAL BILINEAR PROGRAMS ......... 90 8.1 Bilinear Program Formulations..........................................90 8.2 Dimensionality Reduction ..............................................92 8.3 Successive Approximation Algorithm.....................................95 8.4 Online Error Bound ...................................................99 8.5 Advanced Pivot Point Selection ........................................101 8.6 Offline Bound........................................................106 8.7 Contributions ........................................................107 PART III: SAMPLING, FEATURE SELECTION, AND SEARCH 9. SAMPLING BOUNDS ................................................ 109 9.1 Sampling In Value Function Approximation..............................110 9.2 State Selection Error Bounds ..........................................111 9.3 Uniform Sampling Behavior ...........................................116 9.4 Transition Estimation Error ...........................................117 9.5 Implementation of the State Selection Bounds............................120 9.6 Discussion and Related Work ..........................................122 9.7 Empirical Evaluation .................................................124 9.8 Contributions ........................................................128 10.FEATURE SELECTION .............................................. 129 10.1 Feature Considerations................................................129 10.2 Piecewise Linear Features .............................................130 10.3 Selecting Features ....................................................132 10.4 Related Work........................................................135 10.5 Empirical Evaluation .................................................137 10.6 Contributions ........................................................138 11.HEURISTIC SEARCH ................................................ 140 11.1 Introduction .........................................................140 11.2 Search Framework....................................................144 11.3 Learning Heuristic Functions...........................................147 11.4 Feature Combination as a Linear Program...............................156 x

Description:
derstanding and appreciate better approximate dynamic programming. For example, blood inventory and reservoir management could be solved
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.