AUTONOMOUSEVOLUTIONOFSENSORYANDACTUATOR DRIVERLAYERSTHROUGHENVIRONMENTALCONSTRAINTS By TAEHOONANTHONYCHOI ADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOL OFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENT OFTHEREQUIREMENTSFORTHEDEGREEOF DOCTOROFPHILOSOPHY UNIVERSITYOFFLORIDA 2002 ACKNOWLEDGEMENTS I wouldliketothankmy family, colleagues, andfaculty attheUniversity of Florida. Ihavereceivedhelpfromsomanygraciouspeoplethatitwouldbeimpossible topointoutjustafew. I would like to thank my committee members. Dr. A. Antonio Arroyo, Dr. Michael C. Nechyba, Dr. Eric M. Schwartz, and Dr. Carl Crane, for sharing their technicalexpertiseandinsight. Butmoreimportantly,Iwouldliketothankthemfor theirfriendshipwhichIwillcherishforever. IwanttothankMatthewMooreandother colleaguesfortheirinvaluablehelp. Iammostgratefultomyparents,Changlkand KukSunChoi,mywife,HyongSook,andmytwoyearoldson,Ethan,fortheirloveand fortheirsupport. n 1 TABLEOFCONTENTS ACKNOWLEDGEMENTS ii TABLEOFCONTENTS iii LISTOFTABLES vi LISTOFFIGURES vii CHAPTER 1INTRODUCTION 1 2SURVEYOFCURRENTRESEARCHIN AUTONOMOUSMOBILE ROBOTLEARNING 5 2.1 ReinforcementLearning 6 2.2 GeneticProgramming 7 2.3 NeuralNetworks 9 2.4 FuzzyLogic 9 2.5 HybridApproaches 10 3ISSUESINROBOTLEARNING 12 3.1 IssuesinRobotLearningforaPhysicalRobot 12 3.2 IssuesinRobotLearninginSimulations 13 3.3 IssuesinEvaluationofRobotLearning 14 4AUTONOMOUSMOBILEROBOTS 15 4.1 TalrikII™ 15 4.2 Mantaray 10 INNATELEARNING 18 5.1 Introduction 19 5.2 InnateLearningAlgorithm 21 55..22..21 AAssssuummppttiioonnssaabboouutttAhuetHoingohmloyuSstrMuocbtiulreedREonbvoitrsonment 222 5.2.3 AssumptionsaboutInnateKnowledge 23 5.2.4 AutonomousCalibrationoftheSensors 24 5.2.5 AutonomousConfigurationofItsActuators 25 5.2.6 AutonomousConfigurationofItsSensors 28 5.3 Experiment 31 5.3.1 AutonomousMobileRobots 32 5.3.2 Environment 32 5.3.3 Result 32 5.4 Conclusion 34 ill IV 6ENVIRONMENTALREINFORCEMENTLEARNING(ERL) 36 6.1 Introduction 36 6.2 EnvironmentalReinforcementLearning 38 6.2.1 Assumptions 40 6.2.2 EnvironmentalReinforcement(ER)andEnvironmentalConsiderations 40 6.2.3 MutualRefinementProcess 41 6.2.4 FasterConvergencethrough“Fine”and“Rough”AdjustmentTrials 41 6.3 ExperimentalResult 42 6.3.1 Mantaray:AutonomousMobileAgent 42 6.3.2 LearningtoTraverseaStraightLine 42 6.3.3 LearningtoTurn180Degrees 44 6.3.4 SpeedandTurningOffset(AdjustmentValues) 45 6.3.5 MutualRefinement 46 6.3.6 ExperimentalResultsandAnalysis 46 6.4 Conclusion 48 7AUTONOMOUSEVOLUTIONOFSENSORYANDACTUATORDRIVER LAYERSTHROUGHENVIRONMENTALCONSTRAINTS 50 7.1 Introduction 51 7.2 AEDEC 52 7.2.1 Environment 54 7.2.2 InnateKnowledge 55 7.2.3 EffectofHighlyStructuredEnvironmentandInnateKnowledgeontheReinforcement LearningSearchSpace 57 7.2.4 AEDECArchitecture 57 7.3 ProposedExperiment 60 8SENSORYDRIVERLAYER 61 8.1 InnateKnowledgeinSensoryDriverLayer 63 8.2 Scenarios 63 8.3 LearningAlgorithmfortheSensoryDriverLayer 67 8.4 AnalysisoftheSensoryDriverLayer 68 8.4.1 RecognitionoftheOriginalScenarios 68 8.4.2 RecognitionoftheOriginalScenarioswithOneMalfunctioningSensor 70 8.4.3 RecognitionoftheOriginalScenarioswithTwoMalfunctioningSensors 72 8.4.4 DetailedSensorTemplateMappingforObjects 73 8.4.5 ObjectMorphing 75 8.5 SensoryDriverLayer’sAbilitytoCompensateforsensormalfunctions 76 8.6 Real-timeSelfCorrectingSensoryDriverLayer 77 9ACTUATORDRIVERLAYER 79 9.1 InnateKnowledge 80 9.2 AutonomousConfigurationoftheActuators 80 9.3 LearningAlgorithmfortheActuatorDriverLayer 82 9.3.1 Assumptions 83 9.3.2 TheHighlyStructuredEnvironment 83 9.3.3 ModificationstotheAutonomousMobileRobots 83 9.3.4 IncorporationoftheSensoryDriverLayer 84 9.3.5 AlgorithmforLearningtoCalibrateMismatchedActuators 85 9.3.6 AlgorithmforLearningtheDistanceMetrics 86 9.3.7 AlgorithmforLearningthe180DegreeTurn 88 9.3.8 QualitativeAnalysisoftheActuatorDriverLayer 90 V 10ANALYSISOFTHEAEDECSYSTEM 93 10.1 ObstacleAvoidanceBehavior 94 10.2 Wallfollowingbehavior 96 11CONCLUSION 99 APPENDIX ATABLEOFMODIFIEDEUCLIDEANDISTANCEOFEACHSCENARIO RESPECTTOTHEDATABASEOFSCENARIOTEMPLATES 101 BTABLEOFMODIFIEDEUCLIDEANDISTANCEOFEACHSCENARIO RESPECTTOTHEDATABASEOFSCENARIOTEMPLATESWITH ONESENSORMALFUNCTIONING 108 CTABLEOFMODIFIEDEUCLIDEANDISTANCEOFEACHSCENARIO RESPECTTOTHEDATABASEOFSCENARIOTEMPLATESWITH TWOSENSORMALFUNCTIONING 115 DTABLEOFMODIFIEDEUCLIDEANDISTANCEOFEACHSCENARIO RESPECTTOTHENEWDATABASEOFSCENARIOTEMPLATES WITHTWOSENSORMALFUNCTIONING 122 REFERENCES 129 BIOGRAPHICALSKETCH 135 LISTOFTABLES TABLE 5.1 Fourpossiblecombinationsofthemotordirections 26 8.1 Thecompletelistofthescenariosanddescriptions 64 8.2 PartialtableofmodifiedEuclideandistanceofeachscenariorespectto thedatabaseofscenariotemplatesfromAppendixA 69 8.3 PartialtableofmodifiedEuclideandistanceofeachscenariorespectto thedatabaseofscenariotemplateswithonesensormalfunctioningfrom AppendixB 71 8.4 PartialtableofmodifiedEuclideandistanceofeachscenariorespectto thedatabaseofscenariotemplateswithtwosensorsmalfunctioningfrom AppendixC 72 8.5 PartialtableofmodifiedEuclideandistanceofeachscenariorespectto thenewdatabaseofscenariotemplateswithtwosensorsmalfunctioning fromAppendixD 77 9.1 Fourpossiblecombinationsofthemotordirections 81 vi LISTOFFIGURES FIGURE 2.1 Thestandardreinforcementlearningmodel 6 4.1 TalrikII™ 16 4.2 MantaraywithStrabismicsensoryarray 17 5.1 InnateLearningalgorithmformotorconfiguration 28 5.2 Exampleofthesensorpatterncreatedformaximumreadings 29 5.3 InnateLearningalgorithmforsensorconfiguration 30 6.1 Relativeerrorvs.thenumberofrevolutions,whereRoffequalsthe differenceinradiiofthetwowheels 37 6.2 TheEnvironmentalReinforcementLearning(ERL)Architecture 39 6.3 Environmentalsetupfortheexperiment, (a)Environmental Reinforcement(ER)forstraightlinetraversing,(b)Actualpictureofthe environment 43 6.4 EnvironmentalReinforcement(ER)forTurn180degrees 44 6.5 SummaryofexperimentaldataforTurningandSpeedcoefficients. Trials oneto99represent“rough”trials. Therestrepresent“fine”trials (a)and (b):TurningandSpeedcoefficientsvs.Trialsareplottedforasingle batteryvoltagelevelof9.81Volts (c)and(d)representtheaccumulated TurningandSpeedcoefficientsforeachbatterylevel 47 7.1 AEDECarchitectureforsensorydriverlayer 59 7.2 AEDECarchitectureforactuatordriverlayer. (IK=InnateKnowledge) 59 8.1 PictureofWFL30Fscenario. InthepictureTalrikII™isfacingthetopof thepicture 65 8.2 ApictureofCBLscenario. InthepictureTalrikII™isfacingthetopof thepicture 66 8.3 TalrikII™SensorLayoutDiagram[20] 70 8.4 Scenariomappingofa514x514inchblockinthe180degreeregionin frontoftheAMR. Eachradiallinerepresents10degreeoffsetsandeach pointrepresentsoneinchoffsets 74 8.5 Theplotoflengtheninga514inchobjecttoa2714inchobjectin2inch increments 75 9.1 RefinedInnateLearningalgorithmformotorconfiguration 82 9.2 Environmentusedforcalibrationofmismatchedactuators 85 9.3 Plotoftherightmotorspeedvalues 87 9.4 Plotoftimeneededtotravelonefoot 88 9.5 Partoftheenvironmentusedforlearningtoturn180degrees 89 9.6 Plotoftimeneededtoturn180degrees 90 9.7 ThepathoftheAMRbeforelearning(a)andafterlearning(b) 92 vii Vlll 10.1 Obstacleavoidancecode 94 10.2 ThepathoftheobstacleavoidanceprogramforMantaray(a)and(c)and forTalrikII™(b)and(d) 95 10.3 Wallfollowingcode(right) 97 10.4 ThepathofthewallfollowingprogramforTalrikII™(a)andfor Mantaray(b) 98 AbstractofDissertationPresentedtotheGraduateSchool oftheUniversityofFloridainPartialFulfillmentofthe RequirementsfortheDegreeofDoctorofPhilosophy AUTONOMOUSEVOLUTIONOFSENSORYANDACTUATORDRIVERLAYERS THROUGHENVIRONMENTALCONSTRAINTS By TaeHoonAnthonyChoi December2002 Chair: Dr.A.AntonioArroyo Cochair: Dr.MichaelC.Nechyba MajorDepartment: ElectricalandComputerEngineering Althoughfutureapplicationsforautonomousmobilerobots(AMR)arepractically limitless,researchersmustfirstaddresssomeofthehurdlesblockingwideacceptanceof AMRasaviablesolution. Thisresearchaddressessomeoftheseissuesthroughthe realizationofInnateLearning(IL),EnvironmentalReinforcementLearning(ERL),and Autonomous Evolution ofsensory and actuator Driverlayers through Environmental Constraints (AEDEC). Innate Learning (IL) is a learning mechanism that takes advantageofinnateknowledgetoimproveandenhancelearning. Throughtheuseof innate knowledge of its embodiment and its environment, IL provides a simple mechanismtoautonomouslydetectandcorrectdiscreteproductionerrors(i.e.,errorsin wiringofsensors andactuators). Environmental ReinforcementLearning (ERL)isa real-timelearningarchitectureforrefiningprimitivebehaviorsthroughinteractionwithin highly structured environments. ERL architecture allows self-calibration ofsensors. IX X actuators,andprimitivebehaviorsbyusingastructuredenvironment(i.e.,anobstacle course)toprovidereal-timefeedbackonarobot’sperformance. Throughtherefinement process, lower cost parts can be used and damaged parts can be replaced without affectingtherestofthesystem. Finally,AEDEClearningarchitectureistheculmination ofthepreviousresearch,namelyInnateLearning(IL)andEnvironmentalReinforcement Learning(ERL). ByincorporatingILandERL,AEDECassertsthatsensoryandactuator driverlayerscanbeautonomouslyprogrammedfromasimplesetofinnateknowledge guidedbystaticconstraintsfromahighlystructuredenvironment. Throughtheuseof innate knowledge and a highly structured environment, AEDEC allows a robot to autonomouslycreateabstractions(drivers)ofsensoryinformationandactuationcontrols, consequentlyreducingtheworkloadofahumanprogrammer. Sincedifferenttypeof robots (walking, two wheels, caterpillar treads, etc.) can be trained in the same environment,AEDECpermitscode(highlevelbehavior)portabilitybetweendifferent typesofrobots.