ebook img

Self-Learning Optimal Control of Nonlinear Systems : Adaptive Dynamic Programming Approach PDF

240 Pages·8.744 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Self-Learning Optimal Control of Nonlinear Systems : Adaptive Dynamic Programming Approach

Studies in Systems, Decision and Control 103 Qinglai Wei Ruizhuo Song Benkai Li Xiaofeng Lin Self-Learning Optimal Control of Nonlinear Systems Adaptive Dynamic Programming Approach Studies in Systems, Decision and Control Volume 103 Series editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland e-mail: [email protected] About this Series The series “Studies in Systems, Decision and Control” (SSDC) covers both new developments and advances, as well as the state of the art, in the various areas of broadly perceived systems, decision making and control- quickly, up to date and withahighquality.Theintentistocoverthetheory,applications,andperspectives on the state of the art and future developments relevant to systems, decision making,control,complexprocessesandrelatedareas, asembeddedinthefieldsof engineering,computerscience,physics,economics,socialandlifesciences,aswell astheparadigmsandmethodologiesbehindthem.Theseriescontainsmonographs, textbooks, lecture notes and edited volumes in systems, decision making and control spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular valuetoboththecontributorsandthereadershiparetheshortpublicationtimeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. More information about this series at http://www.springer.com/series/13304 Qinglai Wei Ruizhuo Song (cid:129) Benkai Li Xiaofeng Lin (cid:129) Self-Learning Optimal Control of Nonlinear Systems Adaptive Dynamic Programming Approach 123 Qinglai Wei BenkaiLi Institute of Automation Institute of Automation ChineseAcademy of Sciences ChineseAcademy of Sciences Beijing Beijing China China RuizhuoSong Xiaofeng Lin University of Science andTechnology Guangxi University Beijing Guangxi Beijing China China ISSN 2198-4182 ISSN 2198-4190 (electronic) Studies in Systems,DecisionandControl ISBN978-981-10-4079-5 ISBN978-981-10-4080-1 (eBook) DOI 10.1007/978-981-10-4080-1 JointlypublishedwithSciencePress,Beijing,China ISBN:978-7-03-052060-9,SciencePress,Beijing,China NotforsaleoutsidetheMainlandofChina(NotforsaleinHongKongSAR,MacauSAR,andTaiwan, andallcountries,excepttheMainlandofChina) LibraryofCongressControlNumber:2017934060 ©SciencePress,BeijingandSpringerNatureSingaporePteLtd.2018 Thisworkissubjecttocopyright.AllrightsarereservedbythePublishers,whetherthewholeorpart of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilarmethodologynowknownorhereafterdeveloped. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt fromtherelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. Thepublishers,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthis book are believed to be true and accurate at the date of publication. Neither the publishers nor the authorsortheeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinor for any errors or omissions that may have been made. The publishers remains neutral with regard to jurisdictionalclaimsinpublishedmapsandinstitutionalaffiliations. Printedonacid-freepaper ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerNatureSingaporePteLtd. Theregisteredcompanyaddressis:152BeachRoad,#21-01/04GatewayEast,Singapore189721,Singapore Preface Background of this Book Optimal control theory isa mathematical optimization method for deriving control policiesaswellasguaranteeingthestability.Optimalcontrolgenerallysolvesprob- lemsofnonlineardynamicalsystems,whichareubiquitousinnature.Beingstudiedin scienceand engineering for several decades, theoptimal controlcanbe derivedby manymethods.DiscoveredbyRichardBellman,dynamicprogrammingequationisa necessary condition for optimality associated with the mathematical optimization method known as dynamic programming. However, it is often computationally untenabletobreakthe“curseofdimensionality”asknowninrunningthedynamic programming.Approximationsolutionfordynamicprogrammingisrequiredandthe adaptivedynamicprogramming(ADP)methodwasfirstproposedbyWerbosin1977. Inthismethod,asystem,called“critic,”isbuilttoapproximatetheperformanceindex functionindynamicprogrammingtoobtaintheapproximateoptimalcontrolsolution of Hamilton–Jacobi–Bellman (HJB) equation. Specifically, by using a function approximation structure, which is generally constructed by neural networks to approximate the solution of HJB equation, the method obtains the approximate optimal control policy offline or online. Having gained much attention from researchersfordecades,ADPalgorithmsmakedeepfoundationandmuchprogress. The Content of this Book This book focuses on the most recent developments in iterative adaptive dynamic programming algorithms. The book is organized in ten chapters: First, Chap. 1 presents basic principles of ADP algorithms. In Chap. 2, a finite horizon iterative ADP algorithm is proposed to solve the optimal control problem for a class of discrete-time nonlinear systems with unfixed initial state. Chaps. 3–5 focus on Q-learning algorithms which are developed to solve the optimal control problems v vi Preface andinfinitehorizonoptimaltrackingproblems.InChaps.6and7,ADPalgorithms are developed for discrete-time nonlinear systems with general multiobjective performance index functions. In Chap. 8, an online ADP-based optimal control scheme is developed for continuous-time chaotic systems and in Chap. 9, an off-policy integral reinforcement learning algorithm to obtain the optimal tracking controlofunknownchaoticsystemsisestimated.Thefinalchapterproposesanovel sensor scheduling scheme based on ADP, which makes the sensor energy con- sumption and tracking error optimal over the system operational horizon for wireless sensor networks with solar energy harvesting. Beijing, China Qinglai Wei Beijing, China Ruizhuo Song Beijing, China Benkai Li Guangxi, China Xiaofeng Lin January 2017 Acknowledgements This work was supported in part by the National Natural Science Foundation of China under Grants 61374105, 61503379, 61304079, 61673054, 61533017, 60964002, and 61364007, and in part by Guangxi Natural Science Foundation under Grant 2011GXNSFC018017. Beijing, China Qinglai Wei Beijing, China Ruizhuo Song Beijing, China Benkai Li Guangxi, China Xiaofeng Lin January 2017 vii Contents 1 Principle of Adaptive Dynamic Programming. .... .... ..... .... 1 1.1 Dynamic Programming... .... .... .... .... .... ..... .... 1 1.1.1 Discrete-Time Systems .... .... .... .... ..... .... 1 1.1.2 Continuous-Time Systems.. .... .... .... ..... .... 2 1.2 Original Forms of Adaptive Dynamic Programming. ..... .... 3 1.2.1 Principle of Adaptive Dynamic Programming.... .... 4 1.3 Iterative Forms of Adaptive Dynamic Programming. ..... .... 9 1.3.1 Value Iteration... .... .... .... .... .... ..... .... 9 1.3.2 Policy Iteration .. .... .... .... .... .... ..... .... 10 1.4 About This Book .. ..... .... .... .... .... .... ..... .... 11 References. .... .... .... ..... .... .... .... .... .... ..... .... 14 2 An Iterative (cid:1)-Optimal Control Scheme for a Class of Discrete-Time Nonlinear Systems with Unfixed Initial State ........ 19 2.1 Introduction .. .... ..... .... .... .... .... .... ..... .... 19 2.2 Problem Statement . ..... .... .... .... .... .... ..... .... 20 2.3 Properties of the Iterative Adaptive Dynamic Programming Algorithm.... .... ..... .... .... .... .... .... ..... .... 21 2.3.1 Derivation of the Iterative ADP Algorithm . ..... .... 21 2.3.2 Properties of the Iterative ADP Algorithm.. ..... .... 23 2.4 The †-Optimal Control Algorithm... .... .... .... ..... .... 28 2.4.1 The Derivation of the †-Optimal Control Algorithm ... 28 2.4.2 Properties of the †-Optimal Control Algorithm ... .... 32 2.4.3 The †-Optimal Control Algorithm for Unfixed Initial State.... .... .... .... ..... .... 34 2.4.4 The Expressions of the †-Optimal Control Algorithm.... .... .... .... .... ..... .... 37 ix x Contents 2.5 Neural Network Implementation for the †-Optimal Control Scheme.. .... .... ..... .... .... .... .... .... ..... .... 37 2.5.1 The Critic Network... .... .... .... .... ..... .... 38 2.5.2 The Action Network .. .... .... .... .... ..... .... 39 2.6 Simulation Study .. ..... .... .... .... .... .... ..... .... 40 2.7 Conclusions .. .... ..... .... .... .... .... .... ..... .... 42 References. .... .... .... ..... .... .... .... .... .... ..... .... 43 3 Discrete-Time Optimal Control of Nonlinear Systems via Value Iteration-Based Q-Learning... .... .... .... ..... .... 47 3.1 Introduction .. .... ..... .... .... .... .... .... ..... .... 47 3.2 Preliminaries and Assumptions. .... .... .... .... ..... .... 49 3.2.1 Problem Formulations. .... .... .... .... ..... .... 49 3.2.2 Derivation of the Discrete-Time Q-Learning Algorithm . ..... .... .... .... .... .... ..... .... 50 3.3 Properties of the Discrete-Time Q-Learning Algorithm.... .... 52 3.3.1 Non-Discount Case ... .... .... .... .... ..... .... 52 3.3.2 Discount Case... .... .... .... .... .... ..... .... 59 3.4 Neural Network Implementation for the Discrete-Time Q-Learning Algorithm ... .... .... .... .... .... ..... .... 64 3.4.1 The Action Network .. .... .... .... .... ..... .... 65 3.4.2 The Critic Network... .... .... .... .... ..... .... 67 3.4.3 Training Phase... .... .... .... .... .... ..... .... 69 3.5 Simulation Study .. ..... .... .... .... .... .... ..... .... 70 3.5.1 Example 1. ..... .... .... .... .... .... ..... .... 70 3.5.2 Example 2. ..... .... .... .... .... .... ..... .... 76 3.6 Conclusion ... .... ..... .... .... .... .... .... ..... .... 81 References. .... .... .... ..... .... .... .... .... .... ..... .... 82 4 A Novel Policy Iteration-Based Deterministic Q-Learning for Discrete-Time Nonlinear Systems... .... .... .... .... ..... .... 85 4.1 Introduction .. .... ..... .... .... .... .... .... ..... .... 85 4.2 Problem Formulation .... .... .... .... .... .... ..... .... 86 4.3 Policy Iteration-Based Deterministic Q-Learning Algorithm for Discrete-Time Nonlinear Systems.... .... .... ..... .... 87 4.3.1 DerivationofthePolicyIteration-BasedDeterministic Q-Learning Algorithm. .... .... .... .... ..... .... 87 4.3.2 Properties of thePolicy Iteration-Based Deterministic Q-Learning Algorithm. .... .... .... .... ..... .... 89 4.4 NeuralNetworkImplementationforthePolicyIteration-Based Deterministic Q-Learning Algorithm. .... .... .... ..... .... 93 4.4.1 The Critic Network... .... .... .... .... ..... .... 93 4.4.2 The Action Network .. .... .... .... .... ..... .... 95 4.4.3 Summary of the Policy Iteration-Based Deterministic Q-Learning Algorithm. .... .... .... .... ..... .... 96

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.