ebook img

Multi-Agent Coordination: A Reinforcement Learning Approach PDF

315 Pages·2020·12.491 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Multi-Agent Coordination: A Reinforcement Learning Approach

Multi-Agent Coordination IEEEPress 445HoesLane Piscataway,NJ08854 IEEEPressEditorialBoard EkramHossain,EditorinChief JónAtliBenediktsson DavidAlanGrier ElyaB.Joffe XiaoouLi PeterLian AndreasMolisch SaeidNahavandi JeffreyReed DiomidisSpinellis SarahSpurgeon AhmetMuratTekalp Multi-Agent Coordination A Reinforcement Learning Approach Arup Kumar Sadhu Amit Konar Thiseditionfirstpublished2021 ©2021JohnWiley&Sons,Inc. Allrightsreserved.Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,or transmitted,inanyformorbyanymeans,electronic,mechanical,photocopying,recordingorotherwise, exceptaspermittedbylaw.Adviceonhowtoobtainpermissiontoreusematerialfromthistitleis availableathttp://www.wiley.com/go/permissions. TherightofTamilvananShunmugaperumaltobeidentifiedastheauthorofthisworkhasbeenasserted inaccordancewithlaw. RegisteredOffice JohnWiley&Sons,Inc.,111RiverStreet,Hoboken,NJ07030,USA EditorialOffice 111RiverStreet,Hoboken,NJ07030,USA Fordetailsofourglobaleditorialoffices,customerservices,andmoreinformationaboutWileyproducts visitusatwww.wiley.com. Wileyalsopublishesitsbooksinavarietyofelectronicformatsandbyprint-on-demand.Somecontent thatappearsinstandardprintversionsofthisbookmaynotbeavailableinotherformats. LimitofLiability/DisclaimerofWarranty Inviewofongoingresearch,equipmentmodifications,changesingovernmentalregulations,andthe constantflowofinformationrelatingtotheuseofexperimentalreagents,equipment,anddevices,the readerisurgedtoreviewandevaluatetheinformationprovidedinthepackageinsertorinstructions foreachchemical,pieceofequipment,reagent,ordevicefor,amongotherthings,anychangesinthe instructionsorindicationofusageandforaddedwarningsandprecautions.Whilethepublisherand authorshaveusedtheirbesteffortsinpreparingthiswork,theymakenorepresentationsorwarranties withrespecttotheaccuracyorcompletenessofthecontentsofthisworkandspecificallydisclaim allwarranties,includingwithoutlimitationanyimpliedwarrantiesofmerchantabilityorfitnessfora particularpurpose.Nowarrantymaybecreatedorextendedbysalesrepresentatives,writtensales materialsorpromotionalstatementsforthiswork.Thefactthatanorganization,website,orproductis referredtointhisworkasacitationand/orpotentialsourceoffurtherinformationdoesnotmeanthat thepublisherandauthorsendorsetheinformationorservicestheorganization,website,orproductmay provideorrecommendationsitmaymake.Thisworkissoldwiththeunderstandingthatthepublisheris notengagedinrenderingprofessionalservices.Theadviceandstrategiescontainedhereinmaynot besuitableforyoursituation.Youshouldconsultwithaspecialistwhereappropriate.Further,readers shouldbeawarethatwebsiteslistedinthisworkmayhavechangedordisappearedbetweenwhen thisworkwaswrittenandwhenitisread.Neitherthepublishernorauthorsshallbeliableforanyloss ofprofitoranyothercommercialdamages,includingbutnotlimitedtospecial,incidental, consequential,orotherdamages. LibraryofCongressCataloging-in-PublicationData Names:Sadhu,ArupKumar,author.|Konar,Amit,author. Title:Multi-agentcoordination:areinforcementlearningapproach/Arup KumarSadhu,AmitKonar. Description:Hoboken,NewJersey:Wiley-IEEE,[2021]|Includes bibliographicalreferencesandindex. Identifiers:LCCN2020024706(print)|LCCN2020024707(ebook)|ISBN 9781119699033(cloth)|ISBN9781119698999(adobepdf)|ISBN 9781119699026(epub) Subjects:LCSH:Reinforcementlearning.|Multiagentsystems. Classification:LCCQ325.6.S232021 (print)|LCCQ325.6 (ebook)|DDC 006.3/1–dc23 LCrecordavailableathttps://lccn.loc.gov/2020024706 LCebookrecordavailableathttps://lccn.loc.gov/2020024707 Coverdesign:Wiley Coverimage:©Color4260/Shutterstock Setin9.5/12.5ptSTIXTwoTextbySPiGlobal,Pondicherry,India PrintedintheUnitedStatesofAmerica. 10 9 8 7 6 5 4 3 2 1 v Contents Preface xi Acknowledgments xix AbouttheAuthors xxi 1 Introduction:Multi-agentCoordinationbyReinforcementLearning andEvolutionaryAlgorithms 1 1.1 Introduction 2 1.2 SingleAgentPlanning 4 1.2.1 TerminologiesUsedinSingleAgentPlanning 4 1.2.2 SingleAgentSearch-BasedPlanningAlgorithms 10 1.2.2.1 Dijkstra’sAlgorithm 10 1.2.2.2 A∗(A-star)Algorithm 11 1.2.2.3 D∗(D-star)Algorithm 15 1.2.2.4 PlanningbySTRIPS-LikeLanguage 15 1.2.3 SingleAgentRL 17 1.2.3.1 MultiarmedBanditProblem 17 1.2.3.2 DPandBellmanEquation 20 1.2.3.3 CorrelationBetweenRLandDP 21 1.2.3.4 SingleAgentQ-Learning 21 1.2.3.5 SingleAgentPlanningUsingQ-Learning 24 1.3 Multi-agentPlanningandCoordination 25 1.3.1 TerminologiesRelatedtoMulti-agentCoordination 25 1.3.2 ClassificationofMAS 26 1.3.3 GameTheoryforMulti-agentCoordination 28 1.3.3.1 NashEquilibrium 31 1.3.3.2 CorrelatedEquilibrium 36 1.3.3.3 StaticGameExamples 38 1.3.4 CorrelationAmongRL,DP,andGT 40 vi Contents 1.3.5 ClassificationofMARL 40 1.3.5.1 CooperativeMARL 42 1.3.5.2 CompetitiveMARL 56 1.3.5.3 MixedMARL 59 1.3.6 CoordinationandPlanningbyMAQL 84 1.3.7 PerformanceAnalysisofMAQLandMAQL-BasedCoordination 85 1.4 CoordinationbyOptimizationAlgorithm 87 1.4.1 PSOAlgorithm 88 1.4.2 FireflyAlgorithm 91 1.4.2.1 Initialization 92 1.4.2.2 AttractiontoBrighterFireflies 92 1.4.2.3 MovementofFireflies 93 1.4.3 ImperialistCompetitiveAlgorithm 93 1.4.3.1 Initialization 94 1.4.3.2 SelectionofImperialistsandColonies 95 1.4.3.3 FormationofEmpires 95 1.4.3.4 AssimilationofColonies 96 1.4.3.5 Revolution 96 1.4.3.6 ImperialisticCompetition 97 1.4.4 DifferentialEvolutionAlgorithm 98 1.4.4.1 Initialization 99 1.4.4.2 Mutation 99 1.4.4.3 Recombination 99 1.4.4.4 Selection 99 1.4.5 Off-lineOptimization 99 1.4.6 PerformanceAnalysisofOptimizationAlgorithms 99 1.4.6.1 FriedmanTest 100 1.4.6.2 Iman–DavenportTest 100 1.5 Summary 101 References 101 2 ImproveConvergenceSpeedofMulti-AgentQ-LearningforCooperative TaskPlanning 111 2.1 Introduction 112 2.2 LiteratureReview 116 2.3 Preliminaries 118 2.3.1 SingleAgentQ-learning 119 2.3.2 Multi-agentQ-learning 119 2.4 ProposedMAQL 123 Contents vii 2.4.1 TwoUsefulProperties 124 2.5 ProposedFCMQLAlgorithmsandTheirConvergenceAnalysis 128 2.5.1 ProposedFCMQLAlgorithms 129 2.5.2 ConvergenceAnalysisoftheProposedFCMQLAlgorithms 130 2.6 FCMQL-BasedCooperativeMulti-agentPlanning 131 2.7 ExperimentsandResults 134 2.8 Conclusions 141 2.9 Summary 143 2.A MoreDetailsonExperimentalResults 144 2.A.1 AdditionalDetailsofExperiment2.1 144 2.A.2 AdditionalDetailsofExperiment2.2 159 2.A.3 AdditionalDetailsofExperiment2.4 161 References 162 3 ConsensusQ-LearningforMulti-agentCooperativePlanning 167 3.1 Introduction 167 3.2 Preliminaries 169 3.2.1 SingleAgentQ-Learning 169 3.2.2 Equilibrium-BasedMulti-agentQ-Learning 170 3.3 Consensus 171 3.4 ProposedCoQLandPlanning 173 3.4.1 ConsensusQ-Learning 173 3.4.2 Consensus-BasedMulti-robotPlanning 175 3.5 ExperimentsandResults 176 3.5.1 ExperimentalSetup 176 3.5.2 ExperimentsforCoQL 177 3.5.3 ExperimentsforConsensus-BasedPlanning 177 3.6 Conclusions 179 3.7 Summary 180 References 180 4 AnEfficientComputingofCorrelatedEquilibriumforCooperative Q-Learning-BasedMulti-RobotPlanning 183 4.1 Introduction 183 4.2 Single-AgentQ-LearningandEquilibrium-BasedMAQL 186 4.2.1 SingleAgentQ-Learning 187 4.2.2 Equilibrium-BasedMAQL 187 4.3 ProposedCooperativeMAQLandPlanning 188 4.3.1 ProposedSchemeswithTheirApplicability 189 4.3.2 ImmediateRewardsinScheme-Iand-II 190 viii Contents 4.3.3 Scheme-I-InducedMAQL 190 4.3.4 Scheme-II-InducedMAQL 193 4.3.5 AlgorithmsforScheme-IandII 200 4.3.6 ConstraintΩQL-I/ΩQL-II(CΩQL-I/CΩQL-II) 201 4.3.7 Convergence 201 4.3.8 Multi-agentPlanning 207 4.4 ComplexityAnalysis 209 4.4.1 ComplexityofCQL 210 4.4.1.1 SpaceComplexity 210 4.4.1.2 TimeComplexity 210 4.4.2 ComplexityoftheProposedAlgorithms 210 4.4.2.1 SpaceComplexity 211 4.4.2.2 TimeComplexity 211 4.4.3 ComplexityComparison 213 4.4.3.1 SpaceComplexity 213 4.4.3.2 TimeComplexity 214 4.5 SimulationandExperimentalResults 215 4.5.1 ExperimentalPlatform 215 4.5.1.1 Simulation 215 4.5.1.2 Hardware 216 4.5.2 ExperimentalApproach 217 4.5.2.1 LearningPhase 217 4.5.2.2 PlanningPhase 217 4.5.3 ExperimentalResults 218 4.6 Conclusion 226 4.7 Summary 226 4.A SupportingAlgorithmandMathematicalAnalysis 227 References 228 5 AModifiedImperialistCompetitiveAlgorithmforMulti-Robot Stick-CarryingApplication 233 5.1 Introduction 234 5.2 ProblemFormulationforMulti-RobotStick-Carrying 239 5.3 ProposedHybridAlgorithm 242 5.3.1 AnOverviewofICA 242 5.3.1.1 Initialization 242 5.3.1.2 SelectionofImperialistsandColonies 243 5.3.1.3 FormationofEmpires 243 5.3.1.4 AssimilationofColonies 244 5.3.1.5 Revolution 244 Contents ix 5.3.1.6 ImperialisticCompetition 245 5.4 AnOverviewofFA 247 5.4.1 Initialization 247 5.4.2 AttractiontoBrighterFireflies 247 5.4.3 MovementofFireflies 248 5.5 ProposedICFA 248 5.5.1 AssimilationofColonies 251 5.5.1.1 AttractiontoPowerfulColonies 251 5.5.1.2 ModificationofEmpireBehavior 251 5.5.1.3 UnionofEmpires 252 5.6 SimulationResults 254 5.6.1 ComparativeFramework 254 5.6.2 ParameterSettings 254 5.6.3 AnalysisonExplorativePowerofICFA 254 5.6.4 ComparisonofQualityoftheFinalSolution 255 5.6.5 PerformanceAnalysis 258 5.7 ComputerSimulationandExperiment 265 5.7.1 AverageTotalPathDeviation(ATPD) 265 5.7.2 AverageUncoveredTargetDistance(AUTD) 265 5.7.3 ExperimentalSetupinSimulationEnvironment 265 5.7.4 ExperimentalResultsinSimulationEnvironment 266 5.7.5 ExperimentalSetupwithKheperaRobots 268 5.7.6 ExperimentalResultswithKheperaRobots 269 5.8 Conclusion 270 5.9 Summary 272 5.A AdditionalComparisonofICFA 272 References 275 6 ConclusionsandFutureDirections 281 6.1 Conclusions 281 6.2 FutureDirections 283 Index 285 xi Preface Coordination is a fundamental trait in lower level organisms as they used their collectiveefforttoservetheirgoals.Hundreds ofinterestingexamplesofcoordi- nationareavailableinnature.Forexample,antsindividuallycannotcarryasmall food item, but they collectively carry quite avoluminous food to their nest. The tracingof the trajectory of motionof an antfollowing the pheromone deposited byitspredecessoralsoisattractive.Thequeenbeeinhernestdirectsthelaborbees tospecificdirectionsbyherdancepatternsandgesturestocollectfoodresources. These natural phenomena often remind us the scope of coordination among agentstoutilizetheircollectiveintelligenceandactivitiestoservecomplexgoals. Coordinationandplanningarecloselyrelatedterminologiesfromthedomainof multi-robotsystem.Planningreferstothecollectionoffeasiblestepsrequiredto reachapredefinedgoalfromagivenposition.However,coordinationindicatesthe skillfulinteractionamongtheagentstogenerateafeasibleplanningstep.There- fore,coordinationisanimportantissueinthefieldofmulti-robotcoordinationto address complex real-world problems. Coordination usually is of three different types:cooperation,competition,andmixed.Asevidentfromtheirnames,cooper- ation refers to improving the performance of the agents to serve complex goals, which otherwise seems to be very hard for an individual agent because of the restricted availability of hardware/software resources of the agents or deadline/ energylimitsofthetasks.Unlikecooperation,competitionreferstoservingcon- flictinggoalsbytwo(teamof)agents.Forexample,inrobotsoccer,thetwoteams competetowinthegame.Here,eachteamplansbothoffensivelyanddefensively toscoregoalsandthusactcompetitively.Mixedcoordinationindicatesamixture ofcooperationandcompetition.Intheexampleofasoccergame,inter-teamcom- petitionandintra-teamcooperationisthemixedcoordination.Mostofthecom- mon usage of coordination in robotics lies in cooperation of agents to serve a commongoal.Thebookdealswiththecooperationofrobots/roboticagentstoeffi- cientlycompleteacomplextask.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.