ebook img

Approximate Dynamic Programming: Solving the Curses of Dimensionality PDF

487 Pages·2007·24.62 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Approximate Dynamic Programming: Solving the Curses of Dimensionality

APPROXI MATE DYNAMIC PROGRAMMING Solving the Curses of Dimensionality Warren B. Powell Princeton University Princeton, New Jersey mICLNTINNIAL JI mICLNTCNNIAL WILEY-INTERSCIENCE A John Wiley & Sons, Inc., Publication This Page Intentionally Left Blank APPROXIMATE DYNAMIC PROGRAMMING :1U8 0 7 : @WILEY E " 2 0 0 7 5 u DICENT~MIIAL THE WlLEY BICENTENNIAL-KNOWLEDGE FOR GENERATIONS G a c hg eneration has its unique needs and aspirations. When Charles Wiley first opened his small printing shop in lower Manhattan in 1807, it was a generation of boundless potential searching for an identity. And we were there, helping to define a new American literary tradition. Over half a century later, in the midst of the Second Industrial Revolution, it was a generation focused on building the future. Once again, we were there, supplying the critical scientific, technical, and engineering knowledge that helped frame the world. Throughout the 20th Century, and into the new millennium, nations began to reach out beyond their own borders and a new international community was born. Wiley was there, expanding its operations around the world to enable a global exchange of ideas, opinions, and know-how. For 200 years, Wiley has been an integral part of each generation's journey, enabling the flow of information and understanding necessary to meet their needs and fulfill their aspirations. Today, bold new technologies are changing the way we live and learn. Wiley will be there, providing you the must-have knowledge you need to imagine new worlds, new possibilities, and new opportunities. Generations come go, but you can always count on Wiley to provide you the and knowledge you need, when and where you need it! c-\ 4 4.- & - - L A /%&LO% WILLIAM J. PESCE PETER BOOTH WlLEY PRESIDENT AND CHIEF EXECUTIVE OFFICER CHAIRMAN OF THE BOARD APPROXI MATE DYNAMIC PROGRAMMING Solving the Curses of Dimensionality Warren B. Powell Princeton University Princeton, New Jersey mICLNTINNIAL JI mICLNTCNNIAL WILEY-INTERSCIENCE A John Wiley & Sons, Inc., Publication Copyright 0 2007 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 11 1 River Street, Hoboken, NJ 07030, (201) 748-601 1, fax (201) 748- 6008, or online at http://www.wiley.com/go/permission. Limit of LiabilityiDisclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572- 3993 or fax (31 7) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic format. For information about Wiley products, visit our web site at www.wiley.com. Wiley Bicentennial Logo: Richard J. Pacific0 Library of Congress Cataloging-in-Publication Data: Powell, Warren B., 1955- Approximate dynamic programming : solving the curses of dimensionality / Warren B. Powell. p. ; cm. (Wiley series in probability and statistics) Includes bibliographical references. ISBN 978-0-470-17155-4 (cloth : alk. paper) I. Dynamic programming. I. Title. T57. 83. P76 2007 5 19 .7'034c22 2007013724 Printed in the United States of America. 1 0 9 8 7 6 5 4 3 2 1 CONTENTS Preface xi Acknowledgments xv 1 The challenges of dynamic programming 1 1.1 A dynamic programming example: a shortest path problem 2 1.2 The three curses of dimensionality 3 1.3 Some real applications 6 1.4 Problem classes 10 1.5 The many dialects of dynamic programming 12 1.6 What is new in this book? 14 1.7 Bibliographic notes 16 2 Some illustrative models 17 2.1 Deterministic problems 18 2.2 Stochastic problems 23 2.3 Information acquisition problems 36 2.4 A simple modeling framework for dynamic programs 40 2.5 Bibliographic notes 43 Problems 43 V Vi CONTENTS 3 Introduction to Markov decision processes 47 3.1 The optimality equations 48 3.2 Finite horizon problems 53 3.3 Infinite horizon problems 55 3.4 Value iteration 57 3.5 Policy iteration 61 3.6 Hybrid value-policy iteration 63 3.7 The linear programming method for dynamic programs 63 3.8 Monotone policies* 64 3.9 Why does it work?** 70 3.10 Bibliographic notes 85 Problems 86 4 Introduction to approximate dynamic programming 91 4.1 The three curses of dimensionality (revisited) 92 4.2 The basic idea 93 4.3 Sampling random variables 100 4.4 ADP using the post-decision state variable 101 4.5 Low-dimensional representations of value functions 107 4.6 So just what is approximate dynamic programming? 110 4.7 Experimental issues 112 4.8 Dynamic programming with missing or incomplete models 118 4.9 Relationship to reinforcement learning 119 4.10 But does it work? 120 4.11 Bibliographic notes 122 Problems 123 5 Modeling dynamic programs 129 5.1 Notational style 131 5.2 Modeling time 132 5.3 Modeling resources 135 5.4 The states of our system 139 5.5 Modeling decisions 147 5.6 The exogenous information process 151 5.7 The transition function 159 5.8 The contribution function 166 5.9 The objective function 169 5.10 A measure-theoretic view of information** 170 5.11 Bibliographic notes 173 Problems 173 CONTENTS vii 6 Stochastic approximation methods 17 9 6.1 A stochastic gradient algorithm 181 6.2 Deterministic stepsize recipes 183 6.3 Stochastic stepsizes 190 6.4 Computing bias and variance 195 6.5 Optimal stepsizes 197 6.6 Some experimental comparisons of stepsize formulas 204 6.7 Convergence 208 6.8 Why does it work?** 210 6.9 Bibliographic notes 220 Problems 22 1 7 Approximating value functions 225 7.1 Approximation using aggregation 226 7.2 Approximation methods using regression models 237 7.3 Recursive methods for regression models 246 7.4 Neural networks 253 7.5 Value function approximation for batch processes 257 7.6 Why does it work?** 263 7.7 Bibliographic notes 265 Problems 267 8 ADP for finite horizon problems 271 8.1 Strategies for finite horizon problems 272 8.2 Q-learning 276 8.3 Temporal difference learning 279 8.4 Policy iteration 282 8S Monte Carlo value and policy iteration 284 8.6 The actor-critic paradigm 285 8.7 Bias in value function estimation 286 8.8 State sampling strategies 290 8.9 Starting and stopping 294 8.10 A taxonomy of approximate dynamic programming strategies 296 8.11 Why does it work** 298 8.12 Bibliographic notes 298 Problems 299 9 Infinite horizon problems 303 9.1 From finite to infinite horizon 304 9.2 Algorithmic strategies 304 9.3 Stepsizes for infinite horizon problems 313 Viii CONTENTS 9.4 Error measures 315 9.5 Direct ADP for on-line applications 317 9.6 Finite horizon models for steady-state applications 3 17 9.7 Why does it work?** 319 9.8 Bibliographic notes 319 Problems 320 10 Exploration vs. exploitation 323 10.1 A learning exercise: the nomadic trucker 323 10.2 Learning strategies 326 10.3 A simple information acquisition problem 330 10.4 Gittins indices and the information acquisition problem 332 10.5 Variations 337 10.6 The knowledge gradient algorithm 339 10.7 Information acquisition in dynamic programming 342 10.8 Bibliographic notes 346 Problems 346 11 Value function approximations for special functions 351 11.1 Value functions versus gradients 352 11.2 Linear approximations 353 11.3 Piecewise linear approximations 355 11.4 The SHAPE algorithm 359 11.5 Regression methods 362 11.6 Cutting planes* 365 11.7 Why does it work?** 377 11.8 Bibliographic notes 383 Problems 384 12 Dynamic resource allocation problems 387 12.1 An asset acquisition problem 388 12.2 The blood management problem 3 92 12.3 A portfolio optimization problem 40 1 12.4 A general resource allocation problem 404 12.5 A fleet management problem 416 12.6 A driver management problem 42 1 12.7 Bibliographic references 427 Problems 427 13 Implementation challenges 433 13.1 Will ADP work for your problem? 433

Description:
A complete and accessible introduction to the real-world applications of approximate dynamic programming With the growing levels of sophistication in modern-day operations, it is vital for practitioners to understand how to approach, model, and solve complex industrial problems. Approximate Dynamic
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.