ebook img

Feature Selection for Value Function Approximation PDF

113 Pages·2011·3.02 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Feature Selection for Value Function Approximation

Feature Selection for Value Function Approximation by Gavin Taylor Department of Computer Science Duke University Date: Approved: Ronald Parr, Supervisor Vincent Conitzer Mauro Maggioni Peng Sun Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Computer Science in the Graduate School of Duke University 2011 Abstract (Computer Science) Feature Selection for Value Function Approximation by Gavin Taylor Department of Computer Science Duke University Date: Approved: Ronald Parr, Supervisor Vincent Conitzer Mauro Maggioni Peng Sun An abstract of a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Computer Science in the Graduate School of Duke University 2011 Copyright (cid:13)c 2011 by Gavin Taylor All rights reserved Abstract The field of reinforcement learning concerns the question of automated action se- lection given past experiences. As an agent moves through the state space, it must recognize which state choices are best in terms of allowing it to reach its goal. This is quantified with value functions, which evaluate a state and return the sum of rewards the agent can expect to receive from that state. Given a good value function, the agent can choose the actions which maximize this sum of rewards. Value functions are often chosen from a linear space defined by a set of features; this method offers a concise structure, low computational effort, and resistance to overfitting. However, because the number of features is small, this method depends heavily on these few features being expressive and useful, making the selection of these features a core problem. This document discusses this selection. Aside from a review of the field, contributions include a new understanding of the role approximate models play in value function approximation, leading to new methods for analyzing feature sets in an intuitive way, both using the linear and the related kernelized approximation architectures. Additionally, we present a new method for automatically choosing features during value function approximation which has a bounded approximation error and produces superior policies, even in extremely noisy domains. iv To Allison, who supported me more than I knew was possible. v Contents Abstract iv List of Tables ix List of Figures x Acknowledgements xi 1 Introduction 1 1.1 MDPs and Value Functions . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.3 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 Document Organization and Contributions . . . . . . . . . . . . . . . 6 2 Notation and Past Work 9 2.1 Formal Problem Statement and Notation . . . . . . . . . . . . . . . . 9 2.1.1 MDPs, Value Functions, and Policies . . . . . . . . . . . . . . 9 2.1.2 Sampling and the Bellman Operator . . . . . . . . . . . . . . 10 2.1.3 Value Function Approximation Architectures . . . . . . . . . . 11 2.2 Value Function Calculation Algorithms . . . . . . . . . . . . . . . . . 13 2.2.1 Value Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.2 Fitted Value Iteration . . . . . . . . . . . . . . . . . . . . . . 14 vi 2.2.3 Policy Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.4 Least-Squares Policy Iteration . . . . . . . . . . . . . . . . . . 15 2.2.5 Linear Programming . . . . . . . . . . . . . . . . . . . . . . . 16 3 Linear Value Function Approximation 17 3.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1.1 Linear Fixed-Point Methods . . . . . . . . . . . . . . . . . . . 18 3.1.2 Linear Feature Generation . . . . . . . . . . . . . . . . . . . . 21 3.2 Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.3 Linear Fixed-Point Solution and Linear Model Solution Equivalence . 24 4 Kernel-Based Value Function Approximators 26 4.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.1.1 Kernelized Regression . . . . . . . . . . . . . . . . . . . . . . 27 4.1.2 Kernelized Value Function Approximation . . . . . . . . . . . 28 4.2 A General Kernelized Model-Based Solution . . . . . . . . . . . . . . 30 4.3 Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5 Analysis of Error 37 5.1 Error in Linear Value Function Approximations . . . . . . . . . . . . 37 5.1.1 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 39 5.2 Error in Kernel-Based Value Function Approximation . . . . . . . . . 44 5.2.1 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 47 5.3 Generalized Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 6 L -Regularization For Feature Selection 52 1 6.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 6.1.1 Approximate Linear Programming . . . . . . . . . . . . . . . 53 6.1.2 L Regularization for Regression . . . . . . . . . . . . . . . . . 55 1 vii 6.1.3 L Regularization for Value Function Approximation . . . . . 57 1 6.2 L -Regularized Approximate Linear Programming . . . . . . . . . . . 58 1 6.3 Theoretical Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 6.3.1 Noiseless RALP . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.3.2 Noisy RALP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 6.4.1 Benefits of Regularization . . . . . . . . . . . . . . . . . . . . 75 6.4.2 Benchmark Problems . . . . . . . . . . . . . . . . . . . . . . . 76 7 L -Regularized Approximate Linear Programming in 1 Noisy Domains 79 7.1 Smoothers and Averagers for Value Function Approximation . . . . . 80 7.2 Locally Smoothed L -Regularized Approximate Linear Programming 81 1 7.3 Theoretical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 8 Future Work 89 9 Summary and Conclusions 92 9.1 Linear Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 9.2 Kernel Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 9.3 Automatic Feature Selection . . . . . . . . . . . . . . . . . . . . . . . 94 Bibliography 96 Biography 101 viii List of Tables 4.1 Previously introduced methods of kernelized value-function approxi- mation are equivalent to the novel model-based approximation . . . . 36 7.1 Performance of LS-RALP and RALP for the noisy mountain car . . . 87 ix List of Figures 3.1 Illustration of the linear fixed point . . . . . . . . . . . . . . . . . . . 19 5.1 Illustration of the two-room problem from Mahadevan and Maggioni (2006) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.2 Decomposition of the Bellman error for three different problems . . . 45 5.3 Two-room domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.4 DecompositionoftheBellmanerrorforthecontinuoustwo-roomprob- lem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 6.1 A simple four-state chain with no action choices. . . . . . . . . . . . . 54 6.2 Illustration of the geometry of L regularization . . . . . . . . . . . . 56 1 6.3 Illustration of the effect of L regularization on regression . . . . . . . 57 1 6.4 Illustration of Lemma 6.3.5 . . . . . . . . . . . . . . . . . . . . . . . . 69 6.5 Illustration of the effect of noise in RALP . . . . . . . . . . . . . . . 72 6.6 Comparison of the objective value of RALP with the true error. . . . 76 6.7 Comparison of the performance of RALP for multiple values of ψ . . 76 6.8 Comparison of the performance of RALP and ALP for an increasing number of features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 6.9 RALP performance on pendulum as a function on the number of episodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 6.10 RALP performance on bicycle as a function on the number of episodes. 76 7.1 Performance of LS-RALP, RALP, and LSPI for the noisy pendulum . 86 x

Description:
Feature Selection for Value Function Approximation by. Gavin Taylor. Department of Computer Science. Duke University. Date: Approved: Ronald Parr
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.