ebook img

DTIC ADA480609: Anytime Coevolution of Form and Function PDF

9 Pages·0.1 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview DTIC ADA480609: Anytime Coevolution of Form and Function

Anytime Coevolution of Form and Function MagdalenaD.BugajskaandAlanC.Schultz NavyCenterforAppliedResearchinArtificialIntelligence NavalResearchLaboratory,Code5515 Washington,DC20375 {magda,schultz}@aic.nrl.navy.mil tel.(202)404-4946 fax.(202)767-2166 Abstract-Thispaperdescribesanapproachtocontinu- tinuousandembeddedlearning(also calledanytimelearn- ous coevolutionof form (the morphology)and function ing)[Grefenstette1992]isageneralapproachtocontinuous (the control behavior) for autonomous vehicles. This learningin changingenvironments. The vehicle’slearning study focusesoncoevolutionof the characteristicssuch module continuously tests new strategies in an embedded as beam width and range of individual sensors in the simulationmodelwhich is updatedin responseto changes sensorsuite,andthereactivestrategiesforcollision-free intheenvironment.Inthepast,thisapproachhasbeensuc- navigationforanautonomousmicroairvehicle. There- cessfully applied to learning and adaptation of robotic be- sultsoftheevolutionofthesysteminafixedsimulation haviors in dynamic environments as well as in situations model were compared to case-based anytime learning wheretherobotexperiencessensorfailures. Thisstudyfo- (also called continuous and embedded learning) where cusesspecificallyonthecontinuouscoevolutionofamini- the simulation model was updated over time to better malsensorsuite, whichallowsformostefficientcollision- matchchangesintheenvironment. freenavigation,inachangingenvironment.Theapproaches toevolutioninasimulationwithoutfeedbackfromthetask 1 Introduction environment, are compared to case-based continuous and embedded learning [Ramsey1994] in a simulation where Autonomousvehicles that can change their own morphol- suchfeedbackexists. ogy on the fly are highly desirable in many domains. For The remainder of this paper briefly outlines the related example,theabilityofanairvehicletomodifyitsairframe work and then continues with a description of our imple- and the configuration of its control surfaces during cer- mentation of coevolutionof the characteristics of a sensor tain stagesofthe flight, such astake offs, attacks, or land- suite and collision-free navigation of an MAV. The simu- ings,wouldhaveadirectimpactonthesystem’sefficiency, latedenvironment,aircraft,andsensorsaredescribedalong performance, and safety. This shape-shifting or morphing with the details of the learning system. Finally, the initial mechanismwouldalsobedesirableinanUrbanSearchand results of the learning experiments in a changing environ- Rescuerobottoenhanceitsabilitytotraversedifficultinter- mentarediscussed,andthefuturedirectionoftheresearch nalstructureswithincollapsedbuildings. isoutlined. Evolutionaryalgorithmshavebeensuccessfullyapplied to automate the design of robots’ morphology, the design 2 CoevolutionofForm andFunction ofthecontrollers,andmorerecentlytocoevolutionofform andfunction. Itisourbeliefthatthenaturalprocessofco- In recent years, the result of the evolution of behav- evolvingthe formand functionof living organismscan be iors for autonomous agents in simulation ([Nolfi1994, appliedtothedesignofmorphologyandcontrolbehaviors Harvey1992, Schultz1996, Potter2001]) and real world ofautonomousvehiclesinordertosimplifythedesignpro- ([Floreano1996]),andresearchinautomationofstructural cess and improve the performance of the system. In our design([Husbands1996, Funes1997, Lichtensteiger1999, work, coevolution of form and function has been applied Lund1997, Mark1998]), has lead researchers to explore to themicroair vehicle(MAV)domain. Thedesignofthe the concept of coevolution of form and function for au- sensory payload and the controllerfor an MAV is compli- tonomous agents. [Cliff1993] and [Cliff1996] present cated by the size of the vehicle (wingspan on the order of research on concurrent evolution of neural network con- 6 inches), itslimited payload,anda greatvarietyofpossi- trollersandvisualsensormorphologies,forvisuallyguided ble applications. The design issue addressed explicitly in tracking. [Sims1994] presents a system for the coevolu- thisstudy isminimizationofpowerrequirements. Itis as- tion of morphology and behavior of virtual creatures that sumedthatpowerefficiencyisinverselyproportionaltothe competeinaphysicallysimulatedthree-dimensionalworld. coverageofthesensorsuite. Theworkpresentedhereisan Similarworkispresentedin[Hornby2001]wherethebody extensionoftheresearchpublishedin[Bugajska2000]and and brain of the creatures are evolved using Lindenmayer [Bugajska2002]. systemsasgenerativeencoding. In[Lee1996]ahybridge- In addition, an important problem arising for all au- neticprogramming/geneticalgorithmapproachispresented tonomous vehicles that are expected to perform tasks for thatallowsforevolutionofbothcontrollersandrobotbod- extended periods is how to adapt the components of the iestoachievebehavior-specifiedtasks.[Balakrishnan1996] system in response to unexpected changes in the environ- presentsthecomparativestudyofevolutionofacontrolsys- mentorintheirowncapabilitiesinclosetorealtime. Con- tem given a fixed sensor suite, and coevolution of sensor Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE 3. DATES COVERED 2003 2. REPORT TYPE 00-00-2003 to 00-00-2003 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Anytime Coevolution of Form and Function 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION Naval Research Laboratory,Navy Center for Applied Research in REPORT NUMBER Artificial Intelligence (NCARAI),4555 Overlook Avenue SW,Washington,DC,20375 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S) 11. SPONSOR/MONITOR’S REPORT NUMBER(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution unlimited 13. SUPPLEMENTARY NOTES In Proceedings of 2003 Congress on Evolutionary Computaton (CEC-2003), Canberra, Australia, December 8 - 12, 2003. 14. ABSTRACT This paper describes an approach to continuous coevolution of form (the morphology) and function (the control behavior) for autonomous vehicles. This study focuses on coevolution of the characteristics such as beam width and range of individual sensors in the sensor suite, and the reactive strategies for collision-free navigation for an autonomous micro air vehicle. The results of the evolution of the system in a fixed simulation model were compared to case-based anytime learning (also called continuous and embedded learning) where the simulation model was updated over time to better match changes in the environment. 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18. NUMBER 19a. NAME OF ABSTRACT OF PAGES RESPONSIBLE PERSON a. REPORT b. ABSTRACT c. THIS PAGE Same as 8 unclassified unclassified unclassified Report (SAR) Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 characteristics (placement and range) and the control ar- chitecture for the task of box pushing. In previous work [Bugajska2000]and[Bugajska2002], weexploredcoevo- lution of the beam width of the individual sensors in the sensor suite and the collision-free navigation behavior in contextofdifferentcontrollerrepresentationsandcoevolu- tion approaches in micro air vehicles. This study extends the previous work by exploring the coevolution of form and function in the context of changing environments; we combinethecoevolutionofformandfunctionwithanytime learningtechnique. Inaddition,thisstudyextendsourpre- viousworkbyevolvingthesensingrangeoftheindividual sensorsinadditiontothebeamwidth. 2.1Representation Figure1: Thescreenshotofthe3-Dsimulatedenvironment Inthisstudy,eachindividual(chromosome)inthepopula- usedfortheexperiments.Thewhitespheremarksthetarget tion, contains the genetic material describing the informa- anddarkgray(green)sphereswithlightgraycylindersmark tion of both the morphology and the control behavior of theobstacles(trees). the autonomous agent. The characteristics of the sensor suite are encoded in a floating-point vector with elements for beam width and the range of individual sensors in the by a random placement of trees in the environment. The suite(Section5.1).Thecollision-freenavigationbehavioris episodesendwitheitherasuccessfularrivaloftheMAVat representedasasetofstimulus-responserules(Section4.1). the target location, a loss of the MAV due to energy/time runningout, ora loss oftheMAV dueto collisionwith an 2.2Environment obstacle. Thefitnessoftheindividualisbasedonthequal- ityofthesensorsuiteandexecutionofthetaskandasinour Ahigh-fidelity,3-Dflightsimulator(Fig.1),whichincludes previousworkisdefinedasfollows: anaccurateparameterizedmodelofa6-inchMAV,hasbeen if(reachedthegoal) usedtomodeltheenvironmentandthevehicle.Thesimula- payoffisbasedon tionallowstheusertocontroltheaircraftbyspecifyingonly thedistanceMAVtraveled(Section4.3) theturnratevalues;thespeedandaltitudeoftheplaneare PLUS adjustedappropriatelybylow-levelPIDcontrollers. Inthis thequalityofthesensorsuite(Section5.3) study,theMAViscontrolledbyspecifyingdiscreetturning elseif(collisionortimeoutoccured) ratesbetween20and20degreesin5-degreeincrements. payoffbasedon The trees (obstacles) are modeled as spheres on top of thedistanceawayfromtarget(Section4.3) cylinders. Anycontactbetweentheplaneandthetreecon- stitutes a collision. The density of trees is user-defined as It should be noted that the contribution due to the quality thenumberoftreespersquarefootassuminguniformdistri- ofthesensorsuite isconsideredonlyoncethetask perfor- butionandvariedfrom1.25to5.0treesperhundredsquare manceissatisfactoryandthatpayoffisonlyassignedatthe feet. At the beginning of each simulated flight, the MAV endoftheepisode. isplacedatarandomlocationwithinaspecifiedareaaway fromthetarget. Thetargetisstationaryandreachabledur- 3 Continuous andEmbedded Learning ingeverytrial. ThesimulatedMAVhasasensor,whichreturnstherela- The main focus of this study is coevolution of form and tiverangeandbearingtothetarget.Itisalsoequippedwith function for extended periods in changing environments. an array of range sensors positioned symmetrically along Continuous and embedded learning ([Grefenstette1992, the direction of flight and radially from the center of the Ramsey1994]) is a general approach to continuous learn- vehicle. Each sensor is capable of detecting obstacles and ingin a changingenvironment. As shownin Figure 2, the returning the range to the closest object within its field of agents learning module continuously tests new strategies view.Thebeamwidthandtherangeoftheindividualrange againstasimulationmodeloftheenvironment,anddynam- sensorsareevolvedalongwithcontrolbehavior. ically updatesthe knowledgebase (behaviors)used by the agent. The execution module controls the agents interac- 2.3FitnessFunction tion with the environment,and includesa monitorprocess thatcandynamicallymodifythesimulationmodelbasedon Themorphologyofthesensorsuiteandthecontrolbehavior its observation of the environment. When the simulation oftheMAVareevolvedinsimulation. Duringeachevalu- modelis modified, thelearningprocesscontinueswith the ation, a number of episodes is performedthat begins with modified model. The learning system is assumed to oper- placement of the MAV at a random distance away from ate indefinitely, and the execution system uses the results the target facing in a random direction, which is followed Eachrulehasanassociatedstrengthwithitaswellasanum- EXECUTION LEARNING SYSTEM SYSTEM ber of other statistics. During each decision cycle, all the MONITOR rules that match the current state are identified. Conflicts are resolved in favor of rules with higher strength. Rule strengthsareupdatedbasedonrewardsreceivedaftereach DECISION ENVIRONMENT DECISION SIMULATION MAKER trainingepisode.Thefollowingstimuliweredefined: MAKER MODEL MODEL • range1 .. range9: Value between 0.5 and 20 feet in 1-footincrements,whichspecifiesthedistancetothe closestobstaclewithinsensorsfieldofview. ACTIVE TEST LEARNING KNOWLEDGE KNOWLEDGE METHOD • range: Valuebetween0and800feetin1-footincre- BASE BASE ments,whichspecifiesthedistancetothetarget. • bearing: Valuebetween-180and180degreesin20- Figure2: Theanytime(alsoknownascontinuousandem- degreeincrements,whichspecifiesthebearingtothe bedded)learningmodel. target. The action parameter, turn rate, specified the turn rate fortheMAVandtookonvaluesbetween-20and20degrees of learning as they become available. This learning ap- in5-degreeincrements. proachwaspreviouslyusedtocontinuouslyevolvetracking behaviors([Grefenstette1992])anddoortraversingbehav- 4.2LearningMethod ior ([Schultz2000]) in face of changing environment and changesintheagent’sowncapabilitiessuchassensorfail- Thesystemmustlearnacollision-freenavigationbehavior. ures. Inthisstudy,thebehaviorsareevolvedusingtheSAMUEL In this instantiation of anytime learning, the only mea- rule learning system. SAMUEL uses standard genetic al- surable aspectof the environmentis the density of the ob- gorithms and other competition-based heuristics to solve stacles (trees). When the monitor detects the change, the sequential decision problems. It features Lamarckian op- environment model is updated and the learning system is erators(specialization,generalization,merging,avoidance, re-initialized. Currently,thesystemisre-initializedusinga and deletion) that modify rules based on interaction with combinationofpreviouslyevolvedstrategieschosenbased the environment. SAMUEL has to perform a number of ontheirfitnessandthesimilarityofthemodelunderwhich evaluations(on the orderof 80 in the currentstudy) in or- theywereevolved,andasimpledefaultstrategy. derto providehistoryfortheLamarckianoperators,toco- alesce rule strengths, and to account for the noise in the 4 EvolutionofFunction evaluations. The original system implementation and de- faultlearningparameterspertinenttoevolutionofrulesets Theperformanceofthesystemisdeterminedbytheagent’s are described in greater detail in [Grefenstette1990] and ability to perform the task. In our study, the MAV must [Grefenstette1991]. be able to efficiently and safely navigate among obstacles (trees) to a target location. The desired behavior should 4.3FitnessFunctionContribution maximizethenumberoftimestheMAVreachesthetarget location while minimizing the distance traveled to that lo- The fitness of the controlleris proportionalto the distance cation. Everysingleevaluationisperformedinarandomly theMAVtraveledduringsuccessfultrialswhenthegoallo- created environment (random MAV position and orienta- cationis reached, or the minimumdistance away from the tion, random, butuniformtree placement, etc.) with com- targetduringanunsuccessfultrialwhentheagentcrashedor plexitydeterminedbythetreedensity. ranoutoftime,andcontributeseither[0.0-0.3]or[0.5-0.8] respectively, to the global fitness functions. The contribu- tioniscalculatedasfollows: 4.1ProblemRepresentation In this study, the collision-free navigation behavior is im- 0.3∗ 1.0− DA , unsuccessfultrial plemented as a collection of stimulus-response rules (see  (cid:16) DS(cid:17) [Bugajska2002] for alternative approach). Each stimulus- fFIT(~x)= 0.5+0.3∗ DS , successfultrial rceusrpreonntsesernusloerscoonfstihsetsaogfencto,nadnidtioannsatchtiaotnmthatacthsuagggaeinsststathce-  (cid:16)DT(cid:17) tiontobeperformedbyit.Forexample,arule(gene),which where DA is the minimum distance away from the target states that if there is an obstacle fairly close and roughly duringthetrial,DS isaninitialdistanceawayfromthetar- aheadofthevehicle,evenwhenthegoalisalsoaheadofit, get,andDT istotaldistancetraveledduringthetrial. thevehicleshouldturnleft,couldberepresentedas: 5 EvolutionofForm RULE 122 IF sonar4 < 45 AND bearing = [-20, 20] The behavior an agent adopts for a task is determined by THEN SET turn_rate = -100 itsabilitytointeractandsense theenvironment. Thereare awidevarietyofsensorsthatcouldbeimplementedonthe rangeofindividualsensors. The contributionis calculated MAV,butthefinalmakeupofthesensorsuiteisconstrained asfollows: bythesize,weight,andpowercapacityofthevehicle. The C(x) objectiveofthisstudyistoevolvethemostpower-efficient fFORM(~x)=0.2∗ 1.0− sensor suite that guarantees an efficient task-specific con- (cid:18) CMAX(cid:19) trol. Power efficiency is assumed for this study to be in- whereC(x)isthecoverageofthesensorsuiteandCMAX is versely proportional to sensing ability of the agent deter- themaximumpossiblesensorcoveragefortheexperiment; minedbyitssensorsuitecoverage. CMAX iscurrentlyequalto1413.0squarefeet. 5.1ProblemRepresentation 6 Experimental Design Our rangesensor modelis based on a simple rangesensor such as sonar or radar. Itreturnsrangeto a single, closest Similar to [Grefenstette1992], we compared traditional obstacleinitsfield ofview. Thepossibleevolvablesensor evolutioninasimulatedenvironmentwithnofeedbackfrom characteristicsinclude: thetaskenvironment,tocase-basedcontinuousandembed- • rangeoftheindividualsensor; ded learning in a simulated environment which reflected currentstateoftheworld. Theseapproachescanbeviewed • beamwidthoftheindividualsensor; as alternative approaches to system development; in first • placementofindividualsensoronthevehicle. case, the learning is done offline in a simulation designed Inthisstudy,thebeamwidthandtherangeofeachofthe by the experts while in the second case, the learning is availablesensorsarebeingevolved.Thenumberofsensors performedonline after the system has been deployed. We is evolved implicitly since values of beam width or range performedthree separateexperiments; two baseline exper- equaltozeroimplythatthesensorisn’tused. Ninesensors iments which explored evolution in static simulation en- are placed symmetrically along the direction of flight and vironment, and one which applied anytime coevolution of radiallyfromthecenterofthevehicleinincrementsof22.5 formandfunctiontechniquetoadynamicsimulationenvi- degrees. Todecreasethesearchspace,thesymmetryalong ronment.Thetotallengthoftheexperimentwas450gener- theforwardaxisisexploitedandonlytheforwardandfour ationswith100membersinthepopulation.Thecomplexity sensors along one side are represented. The four sensors oftheenvironmentwaschangedevery25generations. along the other side of the vehicle are identicalto the first four.Themaximumbeamwidthofthesensoris45degrees 6.1Experiment1: Fixedcomplexitysimulationmodel whilethemaximumsensingrangeis20.0feet. Thesensorsuitecharacteristicsarerepresentedasavec- In this experiment, all possible solutions throughout the tor of ten values: the beam width and the range for five lengthoftheexperimentwereevaluatedinaseriesofsim- unique sensors, each represented by a floating-pointvalue ulated environments with the same, constant environment between0.0and1.0.Foreachsensor,thefirstgenevalueis complexityindependentofthechangingenvironment. The mappedto 0to 45degreesthatdefinesitsbeamwidth and tree density, which determinesthe complexityof the envi- the second value is mapped to 0 - 20 feet that defines its ronment,wassetto2.5treesper100squarefeet,whichwas sensingrange. previouslydeterminedtoprovideanadequatelearninggra- dientandacceptablelevelofgeneralizationtootherdensi- ties. Wheneverthelearningsystemfoundasolutionwhich 5.2TheLearningMethod outperformedthepreviousoneinthesimulation,theonline The sensor suite characteristics are also evolved using strategywasupdated.Thechangesintheenvironmentwere SAMUEL. In addition to the rule set representation, notregisteredin the simulationandthe learningcontinued SAMUEL allows a set of parameters to be attached to uninterruptedthroughoutthewholeexperiment. each of the rule sets, which we use as described above to represent the sensor characteristics. On these parameters, 6.2Experiment 2: Sampled complexity simulation SAMUEL uses Gaussian mutation (mu = 0 and sigma = model 0.15)andtwo-pointcrossover.Itusesafitness-proportional selectionmethodtochoosetheindividualsoutofthepopu- Similarly to the first baseline experiment, in this experi- lation– the numberof offspringis proportionalto thepar- ment,all theindividualswereevaluatedin a seriesofsim- entsfitness. ulated environments of varied complexity independent of thechangingenvironment. Thetreedensityoftheenviron- ment was chosen at random from uniform distribution of 5.3FitnessFunctionContribution three densities, 1.25, 2.5, and 5 trees per 100 square feet. The fitness of the sensor suite is inversely proportional to Wheneverthelearningsystemfoundasolution,whichout- its coverage and contributes [0.0 .. 0.2] to the global fit- performed the previous one in the simulation, the online ness functions, but only if the agent behavior allows it to strategywasupdated.Toestablishthebaseline,thechanges completethetask,i.e. navigatesafelytothetargetlocation. intheenvironmentwerenotregisteredinthesimulationand Thecoverageofthesensorsuiteiscalculatedasthesumof thelearningcontinueduninterruptedthroughoutthewhole the areasof thesectorsdefinedby the beamwidth andthe experiment. 100 100 Fixed Sampled Anytime Fixed Sampled Anytime 80 80 %) %) e ( 60 e ( 60 c c n n a a m m or 40 or 40 erf erf P P 20 20 0 L M H H L M L H M L M H M L H L H M 0 0 100 200 300 400 0 50 100 150 Generations Generations Figure3: Summaryoftask performanceina changingen- Figure 4: Task performance in the low complexity (1.25 vironment. treesper100sq. ft)environment. 6.3Experiment3: Dynamicsimulationmodel Figures3through6showonlineperformanceofthebest individualsforeachapproach.Eachdatapointinthegraphs Inthisexperiment,theindividualsofthecurrentgeneration representstheaverageperformanceofabest-so-farindivid- were evaluated in a series of simulated environmentswith ualover100episodes. Thedatawasaveragedover3inde- complexity determined by the current, changing environ- pendentsetsofrunsforeachofthebaselineandtheanytime ment. Thetreedensityoftheenvironmentsvariedbetween learningexperiments. Inthisstudy,theperformanceisde- thesamedensitiesasinthepreviousmodel: 1.25,2.5,and finedasthenumberoftimestheMAVreachedthegoalout 5 trees per 100 square feet. Each density was recognized of a hundred. Figure 3 summarizesonlineperformanceof as a separate case. For the first 3 periods (25 generations thesysteminthechangingenvironment. Theverticallines each),thecaseswerepresentedinincreasingorderofcom- in the plot mark the environment changes. The complex- plexity.Fortherestoftheexperiment,thecomplexityofthe ityoftheenvironmentforeachperiodisprovidedalongthe environmentwithineachblockofthreecases wasselected horizontalaxis;Lindicatesthelowest,Mthemedium,and at random. Each case was presented a total of six times. Hthehighesttreedensity.Figures4through6presenteach Forthisstudy, the environmentswere presentedin the fol- level of the environment complexity individually with all lowing order: L (1.25), M (2.5), H (5.0), H, L, M, L, H, therelevantperiodsconcatenated. M,L,M,H,M,L,H,L,H,M.Wheneverthelearningsys- The case-base continuous and embedded learning was temfoundasolutionwhichoutperformedthepreviousone able to outperform both alternative approaches. It is also in the simulation, the online strategy was updated. When worthnotingthateventhoughthe simulationmodelswere thechangeintheenvironmentwasdetected,thesimulation not updated during learning in Experiments 1 and 2, the was updated and the offline learning was reinitialized ac- evolved strategies were to a certain degree tolerant to the cordingtocase-baseanytimelearningstrategy. Onthefirst changesoftheenvironment. Further,thestrategiesevolved occurrenceofthecase,thepopulationwasinitializedusing inasimulationwitha fixedcomplexityweremoregeneral ahomogenous,simpledefaultsetofrules. Thesubsequent than the ones evolved in a simulation which sampled the times,onehalfoftheinitialpopulationwasinitializedbased complexityspace. onasimilaritymetricbetweenthecurrentcaseandthepre- Table1summarizesthecharacteristicsofthefinalsensor viouslyobservedcasesinthecasebase,whiletheotherhalf suitesforeachapproach. Thedatawasaveragedover3in- ofthepopulationwasinitializedusingadefaultruleset. In dependentsetsofrunsforeachofthebaselineandtheany- thisstudy,thesimilaritymetricwassimplydefinedasabso- time learningexperiments. The beam width and the range lutedifferenceintreedensityoftheenvironment. ofthefiveuniquesensorsandthetotalcoverageofthesen- sor suite are presented. The goal of the evolution of form 7 Results was to evolve a sensor suite with minimal coverage in or- dertomaximizepowerefficiencyofthevehiclewhichwas Theresultsofanytimecoevolutionofformandfunctionina definedtobeinverselyproportionaltothesensorcoverage. changingenvironmentforeachoftheapproachesdescribed Bydesign,anytimelearningapproachallowedforhigher inSection6aresummarizedin Figures3through6andin levelofspecializationofsensorssuitesforindividualcases, Table1. butitwasevenabletoimproveonthesensorsuiteevolved Sensor1 Sensor2 Sensor3 Sensor4 Sensor5 Cov Width Range Width Range Width Range Width Range Width Range Fixed 0.0 0.0 12.5 1.7 14.8 6.5 25.0 6.6 5.4 14.5 106.5 Sampled 15.8 10.8 7.5 3.7 3.2 5.7 29.6 11.8 2.2 10.4 108.4 Anytime(L) 23.2 9.8 14.5 10.4 16.2 12.2 30.9 6.1 5.1 13.7 108.6 Anytime(M) 17.0 10.3 16.1 8.9 27.5 4.3 11.9 4.4 3.6 11.9 82.1 Anytime(H) 14.0 7.7 18.1 8.7 12.2 6.5 9.7 1.7 2.9 14.1 70.8 Table1:Characteristicsofthesensorsuitesusingtraditionalevolutioninafixedandsampledsimulationenvironments,and usingcase-basedanytimelearningindynamicsimulationenvironment. Beam widthandthe rangeof theuniquesensors andthetotalcoverageofthesensorsuitesarepresented. 100 100 Fixed Sampled Anytime Fixed Sampled Anytime 80 80 %) %) e ( 60 e ( 60 c c n n a a m m or 40 or 40 erf erf P P 20 20 0 0 0 50 100 150 0 50 100 150 Generations Generations Figure5: Taskperformanceinthemediumcomplexity(2.5 Figure 6: Task performance in the high complexity (5.0 treesper100sq. ft)environment. treesper100sq. ft)environment. in the static medium complexity environment. In general, dition of an anytime (continuous and embedded learning) the sensor suites evolved for the low density environment mechanismallowsformorerobustandadaptivesystems. In didnotrequirefullsetofsensorsandallsetsincludedanar- particularly,it opensthe doorfor vehiclesthatcan morph, row,farreachingfrontsensor,andseveralshortersidesen- that is, change their configuration on the fly for different sors. The higher density environmentsrequired more uni- aspectsofamissionortohandleunexpectedsituations. formdistributionofsensingcoveragebetweenallavailable Experimentalresultswerepresentedwhichshowedthat sensors. continuousandembeddedlearningisafeasibleapproachto These results show that anytime learning is a feasible anytimecoevolutionofformandfunction. Furtherexperi- approachtocontinuouscoevolutionofformandfunction. mentswillbe performedto determineappropriateanytime learningcomponentsforthedomainsuchasre-initialization 8 Conclusions policiesorminimumcasepresence. Weplantoextendthis worktolearncharacteristicsofanairvehicle’sairframethat In this paper, we presentedan approachto continuousand mightbechangedduringamission,suchasthelengthofthe embeddedcoevolutionofform(themorphology)andfunc- tailstructure,andtheshapeandgeometryoftheairfoils. tion(thecontrolbehavior)forautonomousvehicles. While this study focused only on coevolution of the characteris- Acknowledgments ticssuchas beamwidthandrangeofindividualsensorsin thesensorsuite,andthereactivestrategiesforcollision-free The work reported in this paper was supported by navigation for an autonomous micro air vehicle, this ap- the Office of Naval Research under work requests proachcouldbeeasilyextendedtoevolutionofmorecom- N0001403WR20212andN0001403WR20057. plete morphologies for more complex missions. The ad- Bibliography [Hornby2001] Hornby, G. S. and J. B. Pollack (2001). Body-Brain Co-evolution Using L-systems as a Gener- [Balakrishnan1996] Balakrishnan,K.andV.Honovar.On ativeEncoding.InProceedingsoftheGeneticandEvo- SensorEvolutioninRobotics.InKoza,Goldberg,Fogel, lutionaryComputationConference2001,SanFrancisco, andRiolo(eds.),Proceedingsof1996GeneticProgram- CA;pp.868-875. mingConference GP-96;MITPress,pp.455-460;1996. [Husbands1996] Husbands, P., G. Jermy, M. McIlhagga, [Bugajska2000] Bugajska, Magdalena D. and Alan C. and R. Ives. Two Applications of Genetic Algorithms Schultz.(2000)”Co-EvolutionofFormandFunctionin to Component Design. In selected papers from AISB the Design of Autonomous Agents: Micro Air Vehicle WorkshoponEvolutionaryComputing;Fogarty,T(ed.), Project.”InProceedingsofGECCO-2000Workshopon Springer-Verlog; Lecture Notes in Computer Science, Evolutionof Sensorsin Nature, Hardware, and Simula- 1996. tion;LasVegas,NV,July8,2000. [Lee1996] Lee, Wei-Po, J. Hallam, and H. H. Lund. A [Bugajska2002] Bugajska, Magdalena D. and Alan C. Hybrid GP/GA Approach for Co-evolving Controllers Schultz. (2002) ”Coevolution of Form and Function in and RobotBodiesto AchieveFitness-Specific Tasks. In the Design of Micro Air Vehicles.” In Proceedings of ProceedingsofIEEEThirdInternationalConferenceon 2002 NASA/DoD Conference on Evolvable Hardware EvolutionaryComputation,IEEEPress,NJ,;1996. (EH-2002);Washington,DC,July15-18,2002. [Lichtensteiger1999] Lichtensteiger, L. and P. Eggen- [Cliff1993] Cliff, D., I. Harvey,and P. Husbands”Explo- berger. Evolving the Morphology of a Compound Eye rations in Evolutionary Robotics”, Adaptive Behaviour on a Robot. Proceedings of the Third European Work- (MITPress)2(1):71–108,1993. shoponAdvancedMobileRobots(Eurobot99);1999. [Cliff1996] D.CliffandG.F.Miller”Co-evolutionofPur- [Lund1997] Lund,H.H.,J.Hallam,andW-P.Lee.Evolv- suit and Evasion II: Simulation Methods and Results” ing Robot Morphology. Proceedings of IEEE Fourth in: P.Maes,M.Mataric,J.-A.Meyer,J.Pollack,andS. International Conference on Evolutionary computation; Wilson (editors)From Animalsto Animats4: Proceed- IEEEPress,NJ,1997. ings of the Fourth InternationalConference on Simula- [Lund1998] Lund, H. H. and O. Miglino. Evolving and tionofAdaptiveBehavior(SAB96).MITPressBradford Breeding Robots. In Proceedings of First European Books,pp.506–515,1996. Workshop on Evolutionary Robotics; Springer-Verlag, [Floreano1996] Floreano, D. and F. Mondada. Evolution 1998. ofHomingNavigationinaRealMobileRobot.InIEEE [Mark1998] Mark, A., D. Polani, and Thomas Uthmann. TransactionsonSystem, Man, and Cybernetics PartB; A Framework for Sensor Evolution in a Population of 26(3)396-407;1996. BraitenbergVehicle-likeAgents.InC.Adam,R.Belew, [Funes1997] Funes,P.andJ.Pollack.ComputerEvolution H.Kitno,andC.Taylor(eds.),ProceedingsofArtificial of Buildable Objects. In Husbands and Harvey (eds.), LifeIV;pp.428-432;1998. ProceedingsofTheFourthEuropeanConferenceonAr- [Nolfi1994] Nolfi, S., Floreano, D., Mighno, O., and F. tificialLife,MITPress,pp.358-367;1997. Mondada.How to Evolve AutonomousRobots: Differ- [Grefenstette1990] Grefenstette, J. J., Ramsey, C. L. and entApproachinEvolutionaryRobotics.InR.Brooksand Schultz, A. C. (1990). Learning sequential decision P. Maes(eds.),Proceedingsofthe InternationalConfer- rulesusingsimulationmodelsandcompetition.Machine enceArtificialLiveIV;MITPress,pp.190-197;1994. Learning5(4),355-381. [Potter2001] Potter, Mitchell A., Lisa A. Meeden, and AlanC.Schultz.HeterogeneityintheCoevolvedBehav- [Grefenstette1991] Grefenstette, J. J. The Users Guide iors of Mobile Robots: The Emergence of Specialists. to SAMUEL, Version 1.3. NRL Memorandum Report InProceedingsofTheSeventeenthInternationalConfer- 6820,NavalResearchLaboratory,1991. enceonArtificialIntelligence,Seattle,WA;August4-10, [Grefenstette1992] Grefenstette, J. J. and C. L. Ram- 2001. sey. (1992) ”An approach to anytime learning.” In Pro- [Ramsey1994] Ramsey,C.L.andJ.J.Grefenstette.(1994) ceedingsofNinthInternationalConferenceonMachine ”Case-based anytime learning.” In Case Based Reason- Learning, pp. 189-195,MorganKaufmann, San Mateo, ing: Papers from the 1994 Workshop, D. W. Aha, ed., CA. Technical Report WS-94-07, AAAI Press, Menlo Park, [Harvey1992] Harvey, I., P. Husbands, and D. Cliff. Is- CA. sues in Evolutionary Robotics. Proceedings of the Sec- [Schultz1996] Schultz, A. C., J. J. Grefenstette, and ondInternationalConferenceonSimulationofAdaptive WilliamAdams.RoboShepherd:Learningacomplexbe- Behaviour;MITPressBradfordBooks,1993. havior. Presented at RobotLearn96: The Robotics and LearningworkshopatFLAIRS96;1996. [Schultz2000] Schultz, Alan C. and J. J. Grefenstette. (2000) ”Continuous and Embedded Learning in Au- tonomous Vehicles: Adapting to Sensor Failures.” In Proceedings of Unmanned Ground Vehicle Technology II,(Eds.GrantR.Gerhart,RobertW.Gunderson,Chuck M. Shoemaker),ProceedingsofSPIE Vol.4024,pg.55- 62,2000. [Sims1994] Sims, K. Evolving 3D Morphology and Be- haviorbyCompetition.InR.BrooksandP.Maes(eds.), Proceedings of the International Conference Artificial LiveIV;MITPress,Cambridge,MA,pp.28-39;1994.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.