Table Of ContentMulti-Agent Coordination
IEEEPress
445HoesLane
Piscataway,NJ08854
IEEEPressEditorialBoard
EkramHossain,EditorinChief
JónAtliBenediktsson DavidAlanGrier ElyaB.Joffe
XiaoouLi PeterLian AndreasMolisch
SaeidNahavandi JeffreyReed DiomidisSpinellis
SarahSpurgeon AhmetMuratTekalp
Multi-Agent Coordination
A Reinforcement Learning Approach
Arup Kumar Sadhu
Amit Konar
Thiseditionfirstpublished2021
©2021JohnWiley&Sons,Inc.
Allrightsreserved.Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,or
transmitted,inanyformorbyanymeans,electronic,mechanical,photocopying,recordingorotherwise,
exceptaspermittedbylaw.Adviceonhowtoobtainpermissiontoreusematerialfromthistitleis
availableathttp://www.wiley.com/go/permissions.
TherightofTamilvananShunmugaperumaltobeidentifiedastheauthorofthisworkhasbeenasserted
inaccordancewithlaw.
RegisteredOffice
JohnWiley&Sons,Inc.,111RiverStreet,Hoboken,NJ07030,USA
EditorialOffice
111RiverStreet,Hoboken,NJ07030,USA
Fordetailsofourglobaleditorialoffices,customerservices,andmoreinformationaboutWileyproducts
visitusatwww.wiley.com.
Wileyalsopublishesitsbooksinavarietyofelectronicformatsandbyprint-on-demand.Somecontent
thatappearsinstandardprintversionsofthisbookmaynotbeavailableinotherformats.
LimitofLiability/DisclaimerofWarranty
Inviewofongoingresearch,equipmentmodifications,changesingovernmentalregulations,andthe
constantflowofinformationrelatingtotheuseofexperimentalreagents,equipment,anddevices,the
readerisurgedtoreviewandevaluatetheinformationprovidedinthepackageinsertorinstructions
foreachchemical,pieceofequipment,reagent,ordevicefor,amongotherthings,anychangesinthe
instructionsorindicationofusageandforaddedwarningsandprecautions.Whilethepublisherand
authorshaveusedtheirbesteffortsinpreparingthiswork,theymakenorepresentationsorwarranties
withrespecttotheaccuracyorcompletenessofthecontentsofthisworkandspecificallydisclaim
allwarranties,includingwithoutlimitationanyimpliedwarrantiesofmerchantabilityorfitnessfora
particularpurpose.Nowarrantymaybecreatedorextendedbysalesrepresentatives,writtensales
materialsorpromotionalstatementsforthiswork.Thefactthatanorganization,website,orproductis
referredtointhisworkasacitationand/orpotentialsourceoffurtherinformationdoesnotmeanthat
thepublisherandauthorsendorsetheinformationorservicestheorganization,website,orproductmay
provideorrecommendationsitmaymake.Thisworkissoldwiththeunderstandingthatthepublisheris
notengagedinrenderingprofessionalservices.Theadviceandstrategiescontainedhereinmaynot
besuitableforyoursituation.Youshouldconsultwithaspecialistwhereappropriate.Further,readers
shouldbeawarethatwebsiteslistedinthisworkmayhavechangedordisappearedbetweenwhen
thisworkwaswrittenandwhenitisread.Neitherthepublishernorauthorsshallbeliableforanyloss
ofprofitoranyothercommercialdamages,includingbutnotlimitedtospecial,incidental,
consequential,orotherdamages.
LibraryofCongressCataloging-in-PublicationData
Names:Sadhu,ArupKumar,author.|Konar,Amit,author.
Title:Multi-agentcoordination:areinforcementlearningapproach/Arup
KumarSadhu,AmitKonar.
Description:Hoboken,NewJersey:Wiley-IEEE,[2021]|Includes
bibliographicalreferencesandindex.
Identifiers:LCCN2020024706(print)|LCCN2020024707(ebook)|ISBN
9781119699033(cloth)|ISBN9781119698999(adobepdf)|ISBN
9781119699026(epub)
Subjects:LCSH:Reinforcementlearning.|Multiagentsystems.
Classification:LCCQ325.6.S232021 (print)|LCCQ325.6 (ebook)|DDC
006.3/1–dc23
LCrecordavailableathttps://lccn.loc.gov/2020024706
LCebookrecordavailableathttps://lccn.loc.gov/2020024707
Coverdesign:Wiley
Coverimage:©Color4260/Shutterstock
Setin9.5/12.5ptSTIXTwoTextbySPiGlobal,Pondicherry,India
PrintedintheUnitedStatesofAmerica.
10 9 8 7 6 5 4 3 2 1
v
Contents
Preface xi
Acknowledgments xix
AbouttheAuthors xxi
1 Introduction:Multi-agentCoordinationbyReinforcementLearning
andEvolutionaryAlgorithms 1
1.1 Introduction 2
1.2 SingleAgentPlanning 4
1.2.1 TerminologiesUsedinSingleAgentPlanning 4
1.2.2 SingleAgentSearch-BasedPlanningAlgorithms 10
1.2.2.1 Dijkstra’sAlgorithm 10
1.2.2.2 A∗(A-star)Algorithm 11
1.2.2.3 D∗(D-star)Algorithm 15
1.2.2.4 PlanningbySTRIPS-LikeLanguage 15
1.2.3 SingleAgentRL 17
1.2.3.1 MultiarmedBanditProblem 17
1.2.3.2 DPandBellmanEquation 20
1.2.3.3 CorrelationBetweenRLandDP 21
1.2.3.4 SingleAgentQ-Learning 21
1.2.3.5 SingleAgentPlanningUsingQ-Learning 24
1.3 Multi-agentPlanningandCoordination 25
1.3.1 TerminologiesRelatedtoMulti-agentCoordination 25
1.3.2 ClassificationofMAS 26
1.3.3 GameTheoryforMulti-agentCoordination 28
1.3.3.1 NashEquilibrium 31
1.3.3.2 CorrelatedEquilibrium 36
1.3.3.3 StaticGameExamples 38
1.3.4 CorrelationAmongRL,DP,andGT 40
vi Contents
1.3.5 ClassificationofMARL 40
1.3.5.1 CooperativeMARL 42
1.3.5.2 CompetitiveMARL 56
1.3.5.3 MixedMARL 59
1.3.6 CoordinationandPlanningbyMAQL 84
1.3.7 PerformanceAnalysisofMAQLandMAQL-BasedCoordination 85
1.4 CoordinationbyOptimizationAlgorithm 87
1.4.1 PSOAlgorithm 88
1.4.2 FireflyAlgorithm 91
1.4.2.1 Initialization 92
1.4.2.2 AttractiontoBrighterFireflies 92
1.4.2.3 MovementofFireflies 93
1.4.3 ImperialistCompetitiveAlgorithm 93
1.4.3.1 Initialization 94
1.4.3.2 SelectionofImperialistsandColonies 95
1.4.3.3 FormationofEmpires 95
1.4.3.4 AssimilationofColonies 96
1.4.3.5 Revolution 96
1.4.3.6 ImperialisticCompetition 97
1.4.4 DifferentialEvolutionAlgorithm 98
1.4.4.1 Initialization 99
1.4.4.2 Mutation 99
1.4.4.3 Recombination 99
1.4.4.4 Selection 99
1.4.5 Off-lineOptimization 99
1.4.6 PerformanceAnalysisofOptimizationAlgorithms 99
1.4.6.1 FriedmanTest 100
1.4.6.2 Iman–DavenportTest 100
1.5 Summary 101
References 101
2 ImproveConvergenceSpeedofMulti-AgentQ-LearningforCooperative
TaskPlanning 111
2.1 Introduction 112
2.2 LiteratureReview 116
2.3 Preliminaries 118
2.3.1 SingleAgentQ-learning 119
2.3.2 Multi-agentQ-learning 119
2.4 ProposedMAQL 123
Contents vii
2.4.1 TwoUsefulProperties 124
2.5 ProposedFCMQLAlgorithmsandTheirConvergenceAnalysis 128
2.5.1 ProposedFCMQLAlgorithms 129
2.5.2 ConvergenceAnalysisoftheProposedFCMQLAlgorithms 130
2.6 FCMQL-BasedCooperativeMulti-agentPlanning 131
2.7 ExperimentsandResults 134
2.8 Conclusions 141
2.9 Summary 143
2.A MoreDetailsonExperimentalResults 144
2.A.1 AdditionalDetailsofExperiment2.1 144
2.A.2 AdditionalDetailsofExperiment2.2 159
2.A.3 AdditionalDetailsofExperiment2.4 161
References 162
3 ConsensusQ-LearningforMulti-agentCooperativePlanning 167
3.1 Introduction 167
3.2 Preliminaries 169
3.2.1 SingleAgentQ-Learning 169
3.2.2 Equilibrium-BasedMulti-agentQ-Learning 170
3.3 Consensus 171
3.4 ProposedCoQLandPlanning 173
3.4.1 ConsensusQ-Learning 173
3.4.2 Consensus-BasedMulti-robotPlanning 175
3.5 ExperimentsandResults 176
3.5.1 ExperimentalSetup 176
3.5.2 ExperimentsforCoQL 177
3.5.3 ExperimentsforConsensus-BasedPlanning 177
3.6 Conclusions 179
3.7 Summary 180
References 180
4 AnEfficientComputingofCorrelatedEquilibriumforCooperative
Q-Learning-BasedMulti-RobotPlanning 183
4.1 Introduction 183
4.2 Single-AgentQ-LearningandEquilibrium-BasedMAQL 186
4.2.1 SingleAgentQ-Learning 187
4.2.2 Equilibrium-BasedMAQL 187
4.3 ProposedCooperativeMAQLandPlanning 188
4.3.1 ProposedSchemeswithTheirApplicability 189
4.3.2 ImmediateRewardsinScheme-Iand-II 190
viii Contents
4.3.3 Scheme-I-InducedMAQL 190
4.3.4 Scheme-II-InducedMAQL 193
4.3.5 AlgorithmsforScheme-IandII 200
4.3.6 ConstraintΩQL-I/ΩQL-II(CΩQL-I/CΩQL-II) 201
4.3.7 Convergence 201
4.3.8 Multi-agentPlanning 207
4.4 ComplexityAnalysis 209
4.4.1 ComplexityofCQL 210
4.4.1.1 SpaceComplexity 210
4.4.1.2 TimeComplexity 210
4.4.2 ComplexityoftheProposedAlgorithms 210
4.4.2.1 SpaceComplexity 211
4.4.2.2 TimeComplexity 211
4.4.3 ComplexityComparison 213
4.4.3.1 SpaceComplexity 213
4.4.3.2 TimeComplexity 214
4.5 SimulationandExperimentalResults 215
4.5.1 ExperimentalPlatform 215
4.5.1.1 Simulation 215
4.5.1.2 Hardware 216
4.5.2 ExperimentalApproach 217
4.5.2.1 LearningPhase 217
4.5.2.2 PlanningPhase 217
4.5.3 ExperimentalResults 218
4.6 Conclusion 226
4.7 Summary 226
4.A SupportingAlgorithmandMathematicalAnalysis 227
References 228
5 AModifiedImperialistCompetitiveAlgorithmforMulti-Robot
Stick-CarryingApplication 233
5.1 Introduction 234
5.2 ProblemFormulationforMulti-RobotStick-Carrying 239
5.3 ProposedHybridAlgorithm 242
5.3.1 AnOverviewofICA 242
5.3.1.1 Initialization 242
5.3.1.2 SelectionofImperialistsandColonies 243
5.3.1.3 FormationofEmpires 243
5.3.1.4 AssimilationofColonies 244
5.3.1.5 Revolution 244
Contents ix
5.3.1.6 ImperialisticCompetition 245
5.4 AnOverviewofFA 247
5.4.1 Initialization 247
5.4.2 AttractiontoBrighterFireflies 247
5.4.3 MovementofFireflies 248
5.5 ProposedICFA 248
5.5.1 AssimilationofColonies 251
5.5.1.1 AttractiontoPowerfulColonies 251
5.5.1.2 ModificationofEmpireBehavior 251
5.5.1.3 UnionofEmpires 252
5.6 SimulationResults 254
5.6.1 ComparativeFramework 254
5.6.2 ParameterSettings 254
5.6.3 AnalysisonExplorativePowerofICFA 254
5.6.4 ComparisonofQualityoftheFinalSolution 255
5.6.5 PerformanceAnalysis 258
5.7 ComputerSimulationandExperiment 265
5.7.1 AverageTotalPathDeviation(ATPD) 265
5.7.2 AverageUncoveredTargetDistance(AUTD) 265
5.7.3 ExperimentalSetupinSimulationEnvironment 265
5.7.4 ExperimentalResultsinSimulationEnvironment 266
5.7.5 ExperimentalSetupwithKheperaRobots 268
5.7.6 ExperimentalResultswithKheperaRobots 269
5.8 Conclusion 270
5.9 Summary 272
5.A AdditionalComparisonofICFA 272
References 275
6 ConclusionsandFutureDirections 281
6.1 Conclusions 281
6.2 FutureDirections 283
Index 285
xi
Preface
Coordination is a fundamental trait in lower level organisms as they used their
collectiveefforttoservetheirgoals.Hundreds ofinterestingexamplesofcoordi-
nationareavailableinnature.Forexample,antsindividuallycannotcarryasmall
food item, but they collectively carry quite avoluminous food to their nest. The
tracingof the trajectory of motionof an antfollowing the pheromone deposited
byitspredecessoralsoisattractive.Thequeenbeeinhernestdirectsthelaborbees
tospecificdirectionsbyherdancepatternsandgesturestocollectfoodresources.
These natural phenomena often remind us the scope of coordination among
agentstoutilizetheircollectiveintelligenceandactivitiestoservecomplexgoals.
Coordinationandplanningarecloselyrelatedterminologiesfromthedomainof
multi-robotsystem.Planningreferstothecollectionoffeasiblestepsrequiredto
reachapredefinedgoalfromagivenposition.However,coordinationindicatesthe
skillfulinteractionamongtheagentstogenerateafeasibleplanningstep.There-
fore,coordinationisanimportantissueinthefieldofmulti-robotcoordinationto
address complex real-world problems. Coordination usually is of three different
types:cooperation,competition,andmixed.Asevidentfromtheirnames,cooper-
ation refers to improving the performance of the agents to serve complex goals,
which otherwise seems to be very hard for an individual agent because of the
restricted availability of hardware/software resources of the agents or deadline/
energylimitsofthetasks.Unlikecooperation,competitionreferstoservingcon-
flictinggoalsbytwo(teamof)agents.Forexample,inrobotsoccer,thetwoteams
competetowinthegame.Here,eachteamplansbothoffensivelyanddefensively
toscoregoalsandthusactcompetitively.Mixedcoordinationindicatesamixture
ofcooperationandcompetition.Intheexampleofasoccergame,inter-teamcom-
petitionandintra-teamcooperationisthemixedcoordination.Mostofthecom-
mon usage of coordination in robotics lies in cooperation of agents to serve a
commongoal.Thebookdealswiththecooperationofrobots/roboticagentstoeffi-
cientlycompleteacomplextask.