Studies in Systems, Decision and Control 120 Marcin Szuster Zenon Hendzel Intelligent Optimal Adaptive Control for Mechatronic Systems Studies in Systems, Decision and Control Volume 120 Series editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland The series “Studies in Systems, Decision and Control” (SSDC) covers both new developments and advances, as well as the state of the art, in the various areas of broadly perceived systems, decision making and control- quickly, up to date and withahighquality.Theintentistocoverthetheory,applications,andperspectives on the state of the art and future developments relevant to systems, decision making,control,complexprocessesandrelatedareas, asembeddedinthefieldsof engineering,computerscience,physics,economics,socialandlifesciences,aswell astheparadigmsandmethodologiesbehindthem.Theseriescontainsmonographs, textbooks, lecture notes and edited volumes in systems, decision making and control spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems, and others. Of particular valuetoboththecontributorsandthereadershiparetheshortpublicationtimeframe andtheworld-widedistributionandexposurewhichenablesbothawideandrapid dissemination of research output. More information about this series at http://www.springer.com/series/13304 Marcin Szuster Zenon Hendzel (cid:129) Intelligent Optimal Adaptive Control for Mechatronic Systems 123 Marcin Szuster Zenon Hendzel Department ofApplied Mechanics Department ofApplied Mechanics andRobotics,Faculty of Mechanical andRobotics,Faculty of Mechanical EngineeringandAeronautics EngineeringandAeronautics RzeszowUniversity ofTechnology RzeszowUniversity ofTechnology Rzeszow Rzeszow Poland Poland ISSN 2198-4182 ISSN 2198-4190 (electronic) Studies in Systems,DecisionandControl ISBN978-3-319-68824-4 ISBN978-3-319-68826-8 (eBook) https://doi.org/10.1007/978-3-319-68826-8 LibraryofCongressControlNumber:2017956748 ©SpringerInternationalPublishingAG2018 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpart of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission orinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar methodologynowknownorhereafterdeveloped. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfrom therelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authorsortheeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinor for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictionalclaimsinpublishedmapsandinstitutionalaffiliations. Printedonacid-freepaper ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerInternationalPublishingAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland From Marcin Szuster To my wife Sylwia, for supporting me all the way. . . . From Zenon Hendzel To my grandsons Julian, Jan, Dominik, … To learn is not to know. Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Artificial Intelligence and Neural Networks. . . . . . . . . . . . . . . 4 1.2 Learning with a Critic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Scope of Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2 Object of Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1 Two-Wheeled Mobile Robot . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1.1 Description of the Kinematics of a Mobile Robot. . . . 12 2.1.2 Description of the Dynamics of a Mobile Robot. . . . . 21 2.2 Robotic Manipulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.2.1 Description of the Kinematics of a Robotic Manipulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.2.2 Description of the Dynamics of a Robotic Manipulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3 Intelligent Control of Mechatronic Systems . . . . . . . . . . . . . . . . . . 51 3.1 Methods for Control of Nonlinear Systems. . . . . . . . . . . . . . . 51 3.2 Neural Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.2.1 Random Vector Functional Link Neural Network. . . . 56 3.2.2 Neural Network with Gaussian-Type Activation Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4 Optimal Control Methods for Mechatronic Systems . . . . . . . . . . . . 61 4.1 Bellman’s Dynamic Programming . . . . . . . . . . . . . . . . . . . . . 61 4.2 Linear-Quadratic Regulator . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.3 Pontryagin’s Maximum Principle . . . . . . . . . . . . . . . . . . . . . . 71 4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 vii viii Contents 5 Learning Methods for Intelligent Systems. . . . . . . . . . . . . . . . . . . . 85 5.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.1.1 Steepest Descent Algorithm. . . . . . . . . . . . . . . . . . . . 86 5.1.2 Variable Metric Algorithm . . . . . . . . . . . . . . . . . . . . 87 5.1.3 Levenberg–Marquardt Algorithm. . . . . . . . . . . . . . . . 88 5.1.4 Conjugate Gradient Method . . . . . . . . . . . . . . . . . . . 89 5.2 Learning with a Critic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.2.1 Q-Learning Algorithm. . . . . . . . . . . . . . . . . . . . . . . . 92 5.3 Learning Without a Teacher . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.3.1 Winner-Take-All Networks . . . . . . . . . . . . . . . . . . . . 93 5.3.2 Winner-Take-Most Networks. . . . . . . . . . . . . . . . . . . 94 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6 Adaptive Dynamic Programming - Discrete Version. . . . . . . . . . . . 97 6.1 Neural Dynamic Programming. . . . . . . . . . . . . . . . . . . . . . . . 97 6.2 Model-Based Learning Methods. . . . . . . . . . . . . . . . . . . . . . . 101 6.2.1 Heuristic Dynamic Programming. . . . . . . . . . . . . . . . 102 6.2.2 Dual-Heuristic Dynamic Programming. . . . . . . . . . . . 106 6.2.3 Global Dual-Heuristic Dynamic Programming . . . . . . 117 6.3 Model-Free Learning Methods. . . . . . . . . . . . . . . . . . . . . . . . 120 6.3.1 Action-Dependent Heuristic Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 7 Control of Mechatronic Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 7.1 Tracking Control of a WMR and a RM with a PD Controller. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 7.1.1 Synthesis of PD-Type Control. . . . . . . . . . . . . . . . . . 130 7.1.2 Simulation Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 7.1.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 7.2 Adaptive Tracking Control of a WMR . . . . . . . . . . . . . . . . . . 140 7.2.1 Synthesis of an Adaptive Control Algorithm . . . . . . . 141 7.2.2 Simulation Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 7.2.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 7.3 Neural Tracking Control of a WMR. . . . . . . . . . . . . . . . . . . . 148 7.3.1 Synthesis of a Neural Control Algorithm . . . . . . . . . . 148 7.3.2 Simulation Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 7.3.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 7.4 Heuristic Dynamic Programming in Tracking Control of a WMR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 7.4.1 Synthesis of HDP-Type Control . . . . . . . . . . . . . . . . 157 7.4.2 Simulation Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 7.4.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 7.5 Dual-Heuristic Dynamic Programming in Tracking Control of a WMR and a RM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Contents ix 7.5.1 Synthesis of DHP-Type Control . . . . . . . . . . . . . . . . 175 7.5.2 Simulation Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 7.5.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 7.6 Globalised Dual-Heuristic Dynamic Programming in Tracking Control of a WMR and a RM . . . . . . . . . . . . . . . . . 195 7.6.1 Synthesis of GDHP-Type Control . . . . . . . . . . . . . . . 196 7.6.2 Simulation Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 7.6.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 7.7 Action Dependent Heuristic Dynamic Programming in Tracking Control of a WMR . . . . . . . . . . . . . . . . . . . . . . . . . 214 7.7.1 Synthesis of ADHDP-type Control . . . . . . . . . . . . . . 215 7.7.2 Simulation Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 7.7.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 7.8 Behavioural Control of WMR’s Motion . . . . . . . . . . . . . . . . . 224 7.8.1 Behavioural Control Synthesis. . . . . . . . . . . . . . . . . . 228 7.8.2 Simulation Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 7.8.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 7.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 7.9.1 Selection of Value of the Future Reward Discount Factor c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 8 Reinforcement Learning in the Control of Nonlinear Continuous Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 8.1 Classical Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . 256 8.1.1 Control Synthesis, Stability of a System, Reinforcement Learning Algorithm . . . . . . . . . . . . . . 256 8.1.2 Simulation Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 8.1.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 8.2 Approximation of Classical Reinforcement Learning. . . . . . . . 266 8.2.1 Control System Synthesis, Stability of the System, Reinforcement Learning Algorithm . . . . . . . . . . . . . . 267 8.2.2 Simulation Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 8.2.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 8.3 Reinforcement Learning in the Actor-Critic Structure . . . . . . . 271 8.3.1 Synthesis of Control System, System Stability, Reinforcement Learning Algorithm . . . . . . . . . . . . . . 272 8.3.2 Simulation Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 8.3.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 8.4 Reinforcement Learning of Actor-Critic Type in the Optimal Adaptive Control. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 8.4.1 Control Synthesis, Stability of a System, Reinforcement Learning Algorithm . . . . . . . . . . . . . . 280 8.4.2 Simulation Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 x Contents 8.4.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 8.5 Implementation of Critic’s Adaptive Structure in Optimal Control. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 8.5.1 Control Synthesis, Critic’s Learning Algorithm, Stability of a System. . . . . . . . . . . . . . . . . . . . . . . . . 287 8.5.2 Simulation Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 8.5.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296 9 Two-Person Zero-Sum Differential Games and H1 Control . . . . . 299 9.1 H1 control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 9.2 A Two-Person Zero-Sum Differential Game . . . . . . . . . . . . . . 301 9.3 Application of a Two-Person Zero-Sum Differential Game in Control of the Drive Unit of a WMR. . . . . . . . . . . . . . . . . 302 9.3.1 Simulation Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 9.3.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 9.4 Application of a Neural Network in the Two-Person Zero-Sum Differential Game in WMR Control . . . . . . . . . . . . 308 9.4.1 Simulation Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 9.4.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 10 Experimental Verification of Control Algorithms . . . . . . . . . . . . . . 317 10.1 Description of Laboratory Stands. . . . . . . . . . . . . . . . . . . . . . 317 10.1.1 WMR Motion Control Stand. . . . . . . . . . . . . . . . . . . 317 10.1.2 RM Motion Control Stand . . . . . . . . . . . . . . . . . . . . 319 10.2 Analysis of the PD Control . . . . . . . . . . . . . . . . . . . . . . . . . . 321 10.2.1 Analysis of the WMR Motion Control. . . . . . . . . . . . 321 10.2.2 Analysis of the RM Motion Control . . . . . . . . . . . . . 326 10.2.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 10.3 Analysis of the Adaptive Control. . . . . . . . . . . . . . . . . . . . . . 329 10.3.1 Analysis of the WMR Motion Control. . . . . . . . . . . . 329 10.3.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 10.4 Analysis of the Neural Control . . . . . . . . . . . . . . . . . . . . . . . 333 10.4.1 Analysis of the WMR Motion Control. . . . . . . . . . . . 333 10.4.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 10.5 Analysis of the HDP Control. . . . . . . . . . . . . . . . . . . . . . . . . 337 10.5.1 Analysis of the WMR Motion Control. . . . . . . . . . . . 337 10.5.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346 10.6 Analysis of the DHP Control. . . . . . . . . . . . . . . . . . . . . . . . . 347 10.6.1 Analysis of the WMR Motion Control. . . . . . . . . . . . 347 10.6.2 Analysis of the RM Motion Control . . . . . . . . . . . . . 352 10.6.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 10.7 Analysis of the GDHP Control . . . . . . . . . . . . . . . . . . . . . . . 357
Description: