ebook img

Adaptive Dynamic Programming for Control: Algorithms and Stability PDF

431 Pages·2013·6.079 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Adaptive Dynamic Programming for Control: Algorithms and Stability

Communications and Control Engineering Forfurthervolumes: www.springer.com/series/61 Huaguang Zhang (cid:2) Derong Liu (cid:2) Yanhong Luo (cid:2) Ding Wang Adaptive Dynamic Programming for Control Algorithms and Stability HuaguangZhang YanhongLuo CollegeofInformationScienceEngin. CollegeofInformationScienceEngin. NortheasternUniversity NortheasternUniversity Shenyang Shenyang People’sRepublicofChina People’sRepublicofChina DerongLiu DingWang InstituteofAutomation,Laboratory InstituteofAutomation,Laboratory ofComplexSystems ofComplexSystems ChineseAcademyofSciences ChineseAcademyofSciences Beijing Beijing People’sRepublicofChina People’sRepublicofChina ISSN0178-5354 CommunicationsandControlEngineering ISBN978-1-4471-4756-5 ISBN978-1-4471-4757-2(eBook) DOI10.1007/978-1-4471-4757-2 SpringerLondonHeidelbergNewYorkDordrecht LibraryofCongressControlNumber:2012955288 ©Springer-VerlagLondon2013 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped.Exemptedfromthislegalreservationarebriefexcerptsinconnection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’slocation,initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer. PermissionsforusemaybeobtainedthroughRightsLinkattheCopyrightClearanceCenter.Violations areliabletoprosecutionundertherespectiveCopyrightLaw. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Whiletheadviceandinformationinthisbookarebelievedtobetrueandaccurateatthedateofpub- lication,neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityforany errorsoromissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,withrespect tothematerialcontainedherein. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Preface Background ofThisBook Optimal control, once thought of as one of the principal and complex domains in the control field, has been studied extensively in both science and engineering for severaldecades.Asisknown,dynamicalsystemsareubiquitousinnatureandthere exist many methods to design stable controllers for dynamical systems. However, stability is only a bare minimum requirement in the design of a system. Ensuring optimalityguaranteesthestabilityofnonlinearsystems.Asanextensionofthecal- culus of variations, optimal control theory is a mathematical optimization method forderivingcontrolpolicies.Dynamicprogrammingisaveryusefultoolinsolving optimizationandoptimalcontrolproblemsbyemployingtheprincipleofoptimality. However, it is often computationally untenable to run true dynamic programming duetothewell-known“curseofdimensionality”.Hence,theadaptivedynamicpro- gramming (ADP) method was first proposed by Werbos in 1977. By building a system,called“critic”,toapproximatethecostfunctionindynamicprogramming, onecanobtaintheapproximateoptimalcontrolsolutiontodynamicprogramming. In recent years, ADP algorithms have gained much attention from researchers in controlfields.However,withthedevelopmentofADPalgorithms,moreandmore peoplewanttoknowtheanswerstothefollowingquestions: (1) AreADPalgorithmsconvergent? (2) Canthealgorithmstabilizeanonlinearplant? (3) Canthealgorithmberunon-line? (4) Canthealgorithmbeimplementedinafinitetimehorizon? (5) Iftheanswertothefirstquestionispositive,thesubsequentquestionsarewhere thealgorithmconvergesto,andhowlargetheerroris. Before ADP algorithms can be applied to real plants, these questions need to beansweredfirst.Throughoutthisbook,wewillstudyallthesequestionsandgive specificanswerstoeachquestion. v vi Preface WhyThisBook? AlthoughlotsofmonographsonADPhaveappeared,thepresentbookhasunique features,whichdistinguishitfromothers. First,thetypesofsysteminvolvedinthismonographareratherextensive.From thepointofviewofmodels,onecanfindaffinenonlinearsystems,non-affinenon- linearsystems,switchednonlinearsystems,singularlyperturbedsystemsandtime- delaynonlinearsystemsinthisbook;thesearethemainmathematicalmodelsinthe controlfields. Second,sincethemonographisasummaryofrecentresearchworksoftheau- thors, the methods presented here for stabilizing, tracking, and games, which to a greatdegreebenefitfromoptimalcontroltheory,aremoreadvancedthanthoseap- pearinginintroductorybooks.Forexample,thedualheuristicprogrammingmethod isusedtostabilizeaconstrainednonlinearsystem,withconvergenceproof;adata- based robust approximate optimal controller is designed based on simultaneous weightupdatingoftwonetworks;andasinglenetworkschemeisproposedtosolve thenon-zero-sumgameforaclassofcontinuous-timesystems. Last but not least, some rather unique contributions are included in this mono- graph. One notable feature is the implementation of finite horizon optimal control fordiscrete-timenonlinearsystems,whichcanobtainsuboptimalcontrolsolutions within a fixed finite number of control steps. Most existing results in other books discussonlytheinfinitehorizoncontrol,whichisnotpreferredinreal-worldappli- cations.Besidesthisfeature,anothernotablefeatureisthatapairofmixedoptimal policies is developed to solve nonlinear games for the first time when the saddle pointdoesnotexist.Meanwhile,forthesituationthatthesaddlepointexists,exis- tenceconditionsofthesaddlepointareavoided. TheContentof ThisBook Thebookinvolvestenchapters.Asimpliedbythebooktitle,themaincontentofthe bookiscomposedofthreeparts;thatis,optimalfeedbackcontrol,nonlineargames, andrelatedapplicationsofADP.Inthepartonoptimalfeedbackcontrol,theedge- cutting results on ADP-based infinite horizon and finite horizon feedback control, including stabilization control, and tracking control are presented in a systematic manner. In the part on nonlinear games, both zero-sum game and non-zero-sum games are studied. For the zero-sum game, it is proved for the first time that the iterative policies converge to the mixed optimal solutions when the saddle point doesnotexist.Forthenon-zero-sumgame,asinglenetworkisproposedtoseekthe Nash equilibrium for the first time. In the part of applications, a self-learning call admissioncontrolschemeisproposedforCDMAcellularnetworks,andmeanwhile an engine torque and air-fuel ratio control scheme is studied in detail, based on ADP. In Chap. 1, a brief introduction to the background and development of ADP is provided. The review begins with the origin of ADP, and the basic structures Preface vii andalgorithmdevelopmentarenarratedinchronologicalorder.Afterthat,weturn attentiontocontrolproblemsbasedonADP.Wepresentthissubjectregardingtwo aspects: feedback control based on ADP and nonlinear games based on ADP. We mention a few iterative algorithms from recent literature and point out some open problemsineachcase. InChap.2,theoptimalstatefeedbackcontrolproblemisstudiedbasedonADP for both infinite horizon and finite horizon. Three different structures of ADP are utilized to solve the optimal state feedback control strategies, respectively. First, consideringaclassofaffineconstrainedsystems,anewDHPmethodisdeveloped tostabilizethesystem,withconvergenceproof.Then,duetothespecialadvantages ofGDHPstructure,anewoptimalcontrolschemeisdevelopedwithdiscountedcost functional. Moreover, based on a least-square successive approximation method, a series of GHJB equations are solved to obtain the optimal control solutions. Fi- nally,anovelfinite-horizonoptimalcontrolschemeisdevelopedtoobtainthesub- optimal control solutions within a fixed finite number of control steps. Compared withtheexistingresultsintheinfinite-horizoncase,thepresentfinite-horizonopti- malcontrollerispreferredinreal-worldapplications. Chapter 3 presents some direct methods for solving the closed-loop optimal tracking control problem for discrete-time systems. Considering the fact that the performanceindexfunctionsofoptimaltrackingcontrolproblemsarequitediffer- ent from those of optimal state feedback control problems, a new type of perfor- mance index function is defined. The methods are mainly based on iterative HDP andGDHPalgorithms.Wefirststudytheoptimaltrackingcontrolproblemofaffine nonlinearsystems,andafterthatwestudytheoptimaltrackingcontrolproblemof non-affinenonlinearsystems.Itisnoticedthatmostreal-worldsystemsneedtobe effectively controlled within a finite time horizon. Hence, based on the above re- sults, we further study the finite-horizon optimal tracking control problem, using theADPapproachinthelastpartofChap.3. In Chap. 4, the optimal state feedback control problems of nonlinear systems withtimedelaysarestudied.Ingeneral,theoptimalcontrolfortime-delaysystems isaninfinite-dimensionalcontrolproblem,whichisverydifficulttosolve;thereare presentlynogoodmethodsfordealingwiththisproblem.Inthischapter,theopti- malstatefeedbackcontrolproblemsofnonlinearsystemswithtimedelaysbothin statesandcontrolsareinvestigated.Byintroducingadelaymatrixfunction,theex- plicitexpressionoftheoptimalcontrolfunctioncanbeobtained.Next,fornonlinear time-delay systems with saturating actuators, we further study the optimal control problem using a non-quadratic functional, where two optimization processes are developed for searching the optimal solutions. The above two results are for the infinite-horizon optimal control problem. To the best of our knowledge, there are no results on the finite-horizon optimal control of nonlinear time-delay systems. Hence, in the last part of this chapter, a novel optimal control strategy is devel- oped to solve the finite-horizon optimal control problem for a class of time-delay systems. InChap.5,theoptimaltrackingcontrolproblemsofnonlinearsystemswithtime delays are studied using the HDP algorithm. First, the HJB equation for discrete viii Preface time-delaysystemsisderivedbasedonstateerrorandcontrolerror.Then,anovel iterativeHDPalgorithmcontainingtheiterationsofstate,controllaw,andcostfunc- tional is developed. We also give the convergence proof for the present iterative HDP algorithm. Finally, two neural networks, i.e., the critic neural network and theactionneuralnetwork,areusedtoapproximatethevaluefunctionandthecor- responding control law, respectively. It is the first time that the optimal tracking control problem of nonlinear systems with time delays is solved using the HDP algorithm. In Chap. 6, we focus on the design of controllers for continuous-time systems via the ADP approach. Although many ADP methods have been proposed for continuous-timesystems,asuitableframeworkinwhichtheoptimalcontrollercan be designed for a class of general unknown continuous-time systems still has not beendeveloped.Inthefirstpartofthischapter,wedevelopanewschemetodesign optimalrobusttrackingcontrollersforunknowngeneralcontinuous-timenonlinear systems.Themeritofthepresentmethodisthatwerequireonlytheavailabilityof input/outputdata,insteadofanexactsystemmodel.Theobtainedcontrolinputcan beguaranteedtobeclosetotheoptimalcontrolinputwithinasmallbound.Inthe second part of the chapter, a novel ADP-based robust neural network controller is developedforaclassofcontinuous-timenon-affinenonlinearsystems,whichisthe first attempt to extend the ADP approach to continuous-time non-affine nonlinear systems. In Chap. 7, several special optimal feedback control schemes are investigated. Inthefirstpart,theoptimalfeedbackcontrolproblemofaffinenonlinearswitched systems is studied. To seek optimal solutions, a novel two-stage ADP method is developed. The algorithm can be divided into two stages: first, for each possible mode,calculatetheassociatedvaluefunction,andthenselecttheoptimalmodefor eachstate.Inthesecondandthirdparts,thenear-optimalcontrollersfornonlinear descriptor systems and singularly perturbed systems are solved by iterative DHP andHDPalgorithms,respectively.Inthefourthpart,thenear-optimalstate-feedback controlproblemofnonlinearconstraineddiscrete-timesystemsissolvedviaasingle network ADP algorithm. At each step of the iterative algorithm, a neural network isutilizedtoapproximatethecostatefunction,andthentheoptimalcontrolpolicy of the system can be computed directly according to the costate function, which removestheactionnetworkappearingintheordinaryADPstructure. Gametheoryisconcernedwiththestudyofdecisionmakinginasituationwhere two or more rational opponents are involved under conditions of conflicting inter- ests.InChap.8,zero-sumgamesareinvestigatedfordiscrete-timesystemsbasedon themodel-freeADPmethod.First,aneffectivedata-basedoptimalcontrolschemeis developedviatheiterativeADPalgorithmtofindtheoptimalcontrollerofaclassof discrete-timezero-sumgamesforRoessertype2-Dsystems.Sincetheexactmod- els of many 2-D systems cannot be obtained inherently, the iterative ADP method isexpectedtoavoidtherequirementofexactsystemmodels.Second,adata-based optimaloutputfeedbackcontrollerisdevelopedforsolvingthezero-sumgamesof aclassofdiscrete-timesystems,whosemeritisthatknowledgeofthemodelofthe systemisnotrequired,northeinformationofsystemstates. Preface ix In Chap. 9, nonlinear game problems are investigated for continuous-time sys- tems,includinginfinitehorizonzero-sumgames,finitehorizonzero-sumgamesand non-zero-sumgames.First,forthesituationsthatthesaddlepointexists,theADP techniqueisusedtoobtaintheoptimalcontrolpairiteratively.Thepresentapproach makestheperformanceindexfunctionreachthesaddlepointofthezero-sumdiffer- ential games, while complex existence conditions of the saddle point are avoided. Forthesituationsthatthesaddlepointdoesnotexist,themixedoptimalcontrolpair isobtainedtomaketheperformanceindexfunctionreachthemixedoptimum.Then, finitehorizonzero-sumgamesfor aclassofnonaffinenonlinearsystemsarestud- ied.Moreover,besidesthezero-sumgames,thenon-zero-sumdifferentialgamesare studiedbasedonsingle networkADPalgorithm.For zero-sumdifferentialgames, twoplayersworkonacostfunctionaltogetherandminimaxit.However,fornon- zero-sumgames,thecontrolobjectiveistofindasetofpoliciesthatguaranteethe stabilityofthesystemandminimizetheindividualperformancefunctiontoyielda Nashequilibrium. InChap.10,theoptimalcontrolproblemsofmodernwirelessnetworksandauto- motiveenginesarestudiedbyusingADPmethods.Inthefirstpart,anovellearning controlarchitectureis proposedbasedonadaptivecriticdesigns/ADP,withonlya singlemoduleinsteadoftwoorthreemodules.Thechoiceofutilityfunctionforthe presentself-learningcontrolschememakesthepresentlearningprocessmuchmore efficient than existing learning control methods. The call admission controller can performlearninginrealtimeaswellasinoff-lineenvironments,andthecontroller improvesitsperformanceasitgainsmoreexperience.Inthesecondpart,anADP- basedlearningalgorithmisdesignedaccordingtocertaincriteriaandcalibratedfor vehicleoperationover theentire operatingregime.The algorithmis optimizedfor the engine in terms of performance, fuel economy, and tailpipe emissions through asignificanteffortinresearchanddevelopmentandcalibrationprocesses.Afterthe controller has learned to provide optimal control signals under various operating conditionsoff-lineoron-line,itisappliedtoperformthetaskofenginecontrolin real time. The performance of the controller can be further refined and improved throughcontinuouslearninginreal-timevehicleoperations. Acknowledgments Theauthorswouldliketoacknowledgethehelpandencouragementtheyreceived during the course of writing this book. A great deal of the materials presented in this book is based on the research that we conducted with several colleagues and formerstudents,includingQ.L.Wei,Y.Zhang,T.Huang,O.Kovalenko,L.L.Cui, X.Zhang,R.Z.SongandN.Cao.WewishtoacknowledgeespeciallyDr.J.L.Zhang andDr.C.B.Qinfortheirhardworkonthisbook.Theauthorsalsowishtothank Prof.R.E.Bellman,Prof.D.P.Bertsekas,Prof.F.L.Lewis,Prof.J.SiandProf.S.Ja- gannathan for their excellent books on the theory of optimal control and adaptive x Preface dynamicprogramming.WeareverygratefultotheNationalNaturalScienceFoun- dation of China (50977008, 60904037, 61034005, 61034002, 61104010), the Sci- enceandTechnologyResearchProgramofTheEducationDepartmentofLiaoning Province(LT2010040),whichprovidednecessaryfinancialsupportforwritingthis book. Shenyang,China HuaguangZhang Beijing,China DerongLiu Chicago,USA YanhongLuo DingWang Contents 1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 ChallengesofDynamicProgramming . . . . . . . . . . . . . . . . 1 1.2 BackgroundandDevelopmentofAdaptiveDynamicProgramming 3 1.2.1 BasicStructuresofADP . . . . . . . . . . . . . . . . . . . 4 1.2.2 RecentDevelopmentsofADP . . . . . . . . . . . . . . . . 6 1.3 FeedbackControlBasedonAdaptiveDynamicProgramming . . . 11 1.4 Non-linearGamesBasedonAdaptiveDynamicProgramming . . . 17 1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2 OptimalStateFeedbackControlforDiscrete-TimeSystems . . . . . 27 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.2 Infinite-HorizonOptimalStateFeedbackControlBasedonDHP . 27 2.2.1 ProblemFormulation . . . . . . . . . . . . . . . . . . . . 28 2.2.2 Infinite-HorizonOptimalStateFeedbackControlviaDHP . 30 2.2.3 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.3 Infinite-HorizonOptimalStateFeedbackControlBasedonGDHP 52 2.3.1 ProblemFormulation . . . . . . . . . . . . . . . . . . . . 52 2.3.2 Infinite-HorizonOptimalStateFeedbackControlBased onGDHP . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.3.3 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . 67 2.4 Infinite-HorizonOptimalStateFeedbackControlBasedonGHJB Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 2.4.1 ProblemFormulation . . . . . . . . . . . . . . . . . . . . 71 2.4.2 ConstrainedOptimalControlBasedonGHJBEquation . . 73 2.4.3 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . 78 2.5 Finite-HorizonOptimalStateFeedbackControlBasedonHDP . . 80 2.5.1 ProblemFormulation . . . . . . . . . . . . . . . . . . . . 82 2.5.2 Finite-HorizonOptimalStateFeedbackControlBasedon HDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 2.5.3 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . 102 xi

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.