ebook img

Model-Based Reinforcement Learning. From Data to Continuous Actions with a Python-based Toolbox PDF

275 Pages·2023·11.134 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Model-Based Reinforcement Learning. From Data to Continuous Actions with a Python-based Toolbox

(cid:2) Model-BasedReinforcementLearning (cid:2) (cid:2) (cid:2) (cid:2) IEEEPress 445HoesLane Piscataway,NJ08854 IEEEPressEditorialBoard SarahSpurgeon,EditorinChief JónAtliBenediktsson AndreasMolisch DiomidisSpinellis AnjanBose SaeidNahavandi AhmetMuratTekalp AdamDrobot JeffreyReed Peter(Yong)Lian ThomasRobertazzi (cid:2) (cid:2) (cid:2) (cid:2) Model-Based Reinforcement Learning From Data to Continuous Actions with a Python-based Toolbox Milad Farsi and Jun Liu UniversityofWaterloo,Ontario,Canada (cid:2) (cid:2) IEEEPressSeriesonControlSystemsTheoryandApplications MariaDomenicaDiBenedetto,SeriesEditor (cid:2) (cid:2) Copyright©2023byTheInstituteofElectricalandElectronicsEngineers,Inc. Allrightsreserved. PublishedbyJohnWiley&Sons,Inc.,Hoboken,NewJersey. PublishedsimultaneouslyinCanada. Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmittedinany formorbyanymeans,electronic,mechanical,photocopying,recording,scanning,orotherwise, exceptaspermittedunderSection107or108ofthe1976UnitedStatesCopyrightAct,without eitherthepriorwrittenpermissionofthePublisher,orauthorizationthroughpaymentofthe appropriateper-copyfeetotheCopyrightClearanceCenter,Inc.,222RosewoodDrive,Danvers, MA01923,(978)750-8400,fax(978)750-4470,oronthewebatwww.copyright.com.Requeststo thePublisherforpermissionshouldbeaddressedtothePermissionsDepartment,JohnWiley& Sons,Inc.,111RiverStreet,Hoboken,NJ07030,(201)748-6011,fax(201)748-6008,oronlineat http://www.wiley.com/go/permission. Trademarks:WileyandtheWileylogoaretrademarksorregisteredtrademarksofJohnWiley& Sons,Inc.and/oritsaffiliatesintheUnitedStatesandothercountriesandmaynotbeused withoutwrittenpermission.Allothertrademarksarethepropertyoftheirrespectiveowners. JohnWiley&Sons,Inc.isnotassociatedwithanyproductorvendormentionedinthisbook. LimitofLiability/DisclaimerofWarranty:Whilethepublisherandauthorhaveusedtheirbest effortsinpreparingthisbook,theymakenorepresentationsorwarrantieswithrespecttothe (cid:2) accuracyorcompletenessofthecontentsofthisbookandspecificallydisclaimanyimplied (cid:2) warrantiesofmerchantabilityorfitnessforaparticularpurpose.Nowarrantymaybecreatedor extendedbysalesrepresentativesorwrittensalesmaterials.Theadviceandstrategiescontained hereinmaynotbesuitableforyoursituation.Youshouldconsultwithaprofessionalwhere appropriate.Neitherthepublishernorauthorshallbeliableforanylossofprofitoranyother commercialdamages,includingbutnotlimitedtospecial,incidental,consequential,orother damages.Further,readersshouldbeawarethatwebsiteslistedinthisworkmayhavechanged ordisappearedbetweenwhenthisworkwaswrittenandwhenitisread.Neitherthepublisher norauthorsshallbeliableforanylossofprofitoranyothercommercialdamages,includingbut notlimitedtospecial,incidental,consequential,orotherdamages. Forgeneralinformationonourotherproductsandservicesorfortechnicalsupport,please contactourCustomerCareDepartmentwithintheUnitedStatesat(800)762-2974,outsidethe UnitedStatesat(317)572-3993orfax(317)572-4002. Wileyalsopublishesitsbooksinavarietyofelectronicformats.Somecontentthatappearsin printmaynotbeavailableinelectronicformats.FormoreinformationaboutWileyproducts, visitourwebsiteatwww.wiley.com. LibraryofCongressCataloging-in-PublicationDataappliedfor: HardbackISBN:9781119808572 CoverDesign:Wiley CoverImages:©Pobytov/GettyImages;Login/Shutterstock;SazhnievaOksana/Shutterstock Setin9.5/12.5ptSTIXTwoTextbyStraive,Chennai,India (cid:2) (cid:2) v Contents AbouttheAuthors xi Preface xiii Acronyms xv Introduction xvii 1 NonlinearSystemsAnalysis 1 1.1 Notation 1 1.2 NonlinearDynamicalSystems 2 (cid:2) (cid:2) 1.2.1 RemarksonExistence,Uniqueness,andContinuationofSolutions 2 1.3 LyapunovAnalysisofStability 3 1.4 StabilityAnalysisofDiscreteTimeDynamicalSystems 7 1.5 Summary 10 Bibliography 10 2 OptimalControl 11 2.1 ProblemFormulation 11 2.2 DynamicProgramming 12 2.2.1 PrincipleofOptimality 12 2.2.2 Hamilton–Jacobi–BellmanEquation 14 2.2.3 ASufficientConditionforOptimality 15 2.2.4 Infinite-HorizonProblems 16 2.3 LinearQuadraticRegulator 18 2.3.1 DifferentialRiccatiEquation 18 2.3.2 AlgebraicRiccatiEquation 23 2.3.3 ConvergenceofSolutionstotheDifferentialRiccatiEquation 26 2.3.4 ForwardPropagationoftheDifferentialRiccatiEquationforLinear QuadraticRegulator 28 2.4 Summary 30 Bibliography 30 (cid:2) (cid:2) vi Contents 3 ReinforcementLearning 33 3.1 Control-AffineSystemswithQuadraticCosts 33 3.2 ExactPolicyIteration 35 3.2.1 LinearQuadraticRegulator 39 3.3 PolicyIterationwithUnknownDynamicsandFunction Approximations 41 3.3.1 LinearQuadraticRegulatorwithUnknownDynamics 46 3.4 Summary 47 Bibliography 48 4 LearningofDynamicModels 51 4.1 Introduction 51 4.1.1 AutonomousSystems 51 4.1.2 ControlSystems 51 4.2 ModelSelection 52 4.2.1 Gray-Boxvs.Black-Box 52 4.2.2 Parametricvs.Nonparametric 52 4.3 ParametricModel 54 4.3.1 ModelinTermsofBases 54 4.3.2 DataCollection 55 (cid:2) (cid:2) 4.3.3 LearningofControlSystems 55 4.4 ParametricLearningAlgorithms 56 4.4.1 LeastSquares 56 4.4.2 RecursiveLeastSquares 57 4.4.3 GradientDescent 59 4.4.4 SparseRegression 60 4.5 PersistenceofExcitation 60 4.6 PythonToolbox 61 4.6.1 Configurations 62 4.6.2 ModelUpdate 62 4.6.3 ModelValidation 63 4.7 ComparisonResults 64 4.7.1 ConvergenceofParameters 65 4.7.2 ErrorAnalysis 67 4.7.3 RuntimeResults 69 4.8 Summary 73 Bibliography 75 5 StructuredOnlineLearning-BasedControlof Continuous-TimeNonlinearSystems 77 5.1 Introduction 77 5.2 AStructuredApproximateOptimalControlFramework 77 (cid:2) (cid:2) Contents vii 5.3 LocalStabilityandOptimalityAnalysis 81 5.3.1 LinearQuadraticRegulator 81 5.3.2 SOLControl 82 5.4 SOLAlgorithm 83 5.4.1 ODESolverandControlUpdate 84 5.4.2 IdentifiedModelUpdate 85 5.4.3 DatabaseUpdate 85 5.4.4 LimitationsandImplementationConsiderations 86 5.4.5 AsymptoticConvergencewithApproximateDynamics 87 5.5 SimulationResults 87 5.5.1 SystemsIdentifiableinTermsofaGivenSetofBases 88 5.5.2 SystemstoBeApproximatedbyaGivenSetofBases 91 5.5.3 ComparisonResults 98 5.6 Summary 99 Bibliography 99 6 AStructuredOnlineLearningApproachtoNonlinear TrackingwithUnknownDynamics 103 6.1 Introduction 103 6.2 AStructuredOnlineLearningforTrackingControl 104 (cid:2) (cid:2) 6.2.1 StabilityandOptimalityintheLinearCase 108 6.3 Learning-basedTrackingControlUsingSOL 111 6.4 SimulationResults 112 6.4.1 TrackingControlofthePendulum 113 6.4.2 SynchronizationofChaoticLorenzSystem 114 6.5 Summary 115 Bibliography 118 7 PiecewiseLearningandControlwithStability Guarantees 121 7.1 Introduction 121 7.2 ProblemFormulation 122 7.3 ThePiecewiseLearningandControlFramework 122 7.3.1 SystemIdentification 123 7.3.2 Database 124 7.3.3 FeedbackControl 125 7.4 AnalysisofUncertaintyBounds 125 7.4.1 QuadraticProgramsforBoundingErrors 126 7.5 StabilityVerificationforPiecewise-AffineLearningandControl 129 7.5.1 PiecewiseAffineModels 129 7.5.2 MIQP-basedStabilityVerificationofPWASystems 130 (cid:2) (cid:2) viii Contents 7.5.3 ConvergenceofACCPM 133 7.6 NumericalResults 134 7.6.1 PendulumSystem 134 7.6.2 DynamicVehicleSystemwithSkidding 138 7.6.3 ComparisonofRuntimeResults 140 7.7 Summary 142 Bibliography 143 8 AnApplicationtoSolarPhotovoltaicSystems 147 8.1 Introduction 147 8.2 ProblemStatement 150 8.2.1 PVArrayModel 151 8.2.2 DC-DCBoostConverter 152 8.3 OptimalControlofPVArray 154 8.3.1 MaximumPowerPointTrackingControl 156 8.3.2 ReferenceVoltageTrackingControl 162 8.3.3 PiecewiseLearningControl 164 8.4 ApplicationConsiderations 165 8.4.1 PartialDerivativeApproximationProcedure 165 8.4.2 PartialShadingEffect 167 (cid:2) (cid:2) 8.5 SimulationResults 170 8.5.1 ModelandControlVerification 173 8.5.2 ComparativeResults 174 8.5.3 Model-FreeApproachResults 176 8.5.4 PiecewiseLearningResults 178 8.5.5 PartialShadingResults 179 8.6 Summary 182 Bibliography 182 9 AnApplicationtoLow-levelControlofQuadrotors 187 9.1 Introduction 187 9.2 QuadrotorModel 189 9.3 StructuredOnlineLearningwithRLSIdentifieronQuadrotor 190 9.3.1 LearningProcedure 191 9.3.2 AsymptoticConvergencewithUncertainDynamics 195 9.3.3 ComputationalProperties 195 9.4 NumericalResults 197 9.5 Summary 201 Bibliography 201 10 PythonToolbox 205 10.1 Overview 205 10.2 UserInputs 205 (cid:2)

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.