ebook img

DTIC ADA442283: Spatio-Temporal Case-Based Reasoning for Efficient Reactive Robot Navigation PDF

2.6 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview DTIC ADA442283: Spatio-Temporal Case-Based Reasoning for Efficient Reactive Robot Navigation

Spatio-Temporal Case-Based Reasoning for † Efficient Reactive Robot Navigation Maxim Likhachev‡, Michael Kaess, Zsolt Kira, and Ronald C. Arkin Mobile Robot Laboratory College of Computing, Georgia Institute of Technology [email protected], [email protected], [email protected], [email protected] Abstract LLiibbrraarryy ooff This paper presents an approach to automatic selection and CCaasseess modification of behavioral assemblage parameters for BBeehhaavviioorr 11 11sstt SSeett autonomous navigation tasks. The goal of this research is to make ooff PPaarrmmss obsolete the task of manual configuration of behavioral BBeehhaavviioorr BBeehhaavviioorr rroouugghhllyy AAddaappttaattiioonn BBeehhaavviioorr parameters, which often requires significant knowledge of robot ccoorrrreessppoonnddiinngg aaddjjuusstteedd PPrroocceessss ffiinnee ttuunneedd behavior and extensive experimentation, and to increase the ttoo tthhee rroobboott’’ss ttoo tthhee rroobboott’’ss ooff tthhee ttoo tthhee rroobboott’’ss ccuurrrreenntt ggooaall ccuurrrreenntt SSeelleecctteedd CCaassee ccuurrrreenntt efficiency of robot navigation by automatically choosing and fine- eennvviirroonnmmeenntt eennvviirroonnmmeenntt tuning the parameters that fit the robot task-environment well in BBeehhaavviioorr NN KKtthhSSeett real-time. The method is based on the Case-Based Reasoning ooff PPaarrmmss paradigm. Derived from incoming sensor data, this approach computes spatial features of the environment. Based on the robot’s performance, temporal features of the environment are then Figure 1. Behavioral selection process with Case-Based computed. Both sets of features are then used to select and fine- Reasoning module incorporated. tune a set of parameters for an active behavioral assemblage. By continuously monitoring the sensor data and performance of the parameterization, for most non-trivial cases, results in the robot, the method reselects these parameters as necessary. While robot performance being far from optimal. Also, choosing a mapping from environmental features onto behavioral the "right" set of parameters even in the case of constant parameters, i.e., the cases, can be hard-coded, a method for parameterization is a difficult task requiring both learning new and optimizing existing cases is also presented. This knowledge of robot behaviors and a number of preliminary completely automates the process of behavioral parameterization. experiments. It is desirable to avoid this manual The system was integrated within a hybrid robot architecture and configuration of behavioral parameters in order to make extensively evaluated using simulations and indoor and outdoor mission specification as user-friendly and rapid as possible. real world robotic experiments in multiple environments and It is also desirable to avoid the requirement of knowing sensor modalities, clearly demonstrating the benefits of the approach. when and which types of environments a robot will encounter during its mission. Index terms: Case-Based Reasoning, Behavior-Based Robotics, This paper presents a solution to these problems by Reactive Robotics. incorporating Case-Based Reasoning into the behavior selection process. Case-Based Reasoning is a form of I. INTRODUCTION learning in which specific instances or experiences are stored in a library of cases, and are later retrieved and Behavior-based control for robotics is known to adapted to similar situations when they occur [12]. The provide good performance in unknown or dynamic Case-Based Reasoning (CBR) module operates at the environments. Such robotic systems require little a priori reactive level of the robot. Given a particular behavioral knowledge and are very fast in response to changes in the assemblage that the robot executes, the CBR module selects environment, as they advocate a tight coupling of the set of parameters for the chosen behaviors that is best perceptual data to an action. At any point in time a robot suited for the current environment. As the robot executes its selects a subset of behaviors, called a behavioral mission, the CBR module controls the switching between assemblage, based on incoming sensory data from the set of different sets of behavioral parameters in response to predefined behaviors and then executes them. One of the changes in the environment. Each such set of parameters problems of this approach, however, is that as the constitutes a case in the CBR library of cases and is indexed surrounding environment gradually changes, the by spatial and temporal features of the environment. Spatial parameterization of the selected behaviors should also be features are computed from the sensory data of the robot, adjusted correspondingly. Using a constant, non-adaptive † This research is supported by DARPA/U.S. Army SMDC contract #DASG60-99-C-0081. Approved for Public Release; distribution unlimited. ‡ Presently at Carnegie Mellon University. Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE 3. DATES COVERED 2005 2. REPORT TYPE 00-00-2005 to 00-00-2005 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Spatio-Temporal Case-Based Reasoning for Efficient Reactive Robot 5b. GRANT NUMBER Navigation 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION Defense Advanced Research Projects Agency,3701 North Fairfax REPORT NUMBER Drive,Arlington,VA,22203-1714 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S) 11. SPONSOR/MONITOR’S REPORT NUMBER(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution unlimited 13. SUPPLEMENTARY NOTES The original document contains color images. 14. ABSTRACT see report 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18. NUMBER 19a. NAME OF ABSTRACT OF PAGES RESPONSIBLE PERSON a. REPORT b. ABSTRACT c. THIS PAGE 17 unclassified unclassified unclassified Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 while temporal features are computed based on the robot’s performance. The adaptation step in Case-Based Reasoning High-level plan Behavioral Behavioral Control Motor subsequently fine-tunes the parameters to a specific type of (FSA) Assemblage Unit Vector environment, allowing the library of cases to be small. The Perceptual Current State, Behavioral Assemblage overall control flow is shown in figure 1. Such a method Input Current Goal Parameters permits automatic selection of optimal parameters at run- CBR Unit time while the mission specification process no longer requires manual configuration of these values. Case Index While the library of cases that the CBR module uses can be manually supplied for a given robot, this paper Case Library also proposes an extension to the CBR module that allows it to create and optimize cases in the library automatically as Figure 2. Integration of the case-base reasoning module within the result of either experiments or actual mission the AuRA architecture. executions. In the learning mode, the module can start with using the Case-Based Reasoning methodology during either a completely empty, partially specified, or fully learning and thus decreases the number of experiments specified library. It will then add new cases as necessary required for obtaining a good behavioral parameterization and optimize the new and existing cases by performing a function as defined by the use of a library of cases. gradient descent search in the space of behavioral parameters for each of the cases in the library. Once the training is over and the robot exhibits good performance in the training session, the library can be "frozen". II. ARCHITECTURAL CONSIDERATIONS Case-Based Reasoning methodology is not new to The framework chosen for the integration of Case- the field of robotics. It was successfully used to help in Based Reasoning for behavioral selection is the MissionLab solving such problems as path planning based on past system [9], which is a version of AuRA (Autonomous routes, high-level action-selection based on environmental Robot Architecture) [10]. This hybrid architecture consists similarities, low-level action-selection based on local sensor of a schema-based reactive system coupled with a high- information, place learning, acceleration of complex level deliberative planning system. The reactive component problem solving based on past problem solutions and other consists of primitive behaviors called motor schemas [11] problems [4, 5, 6, 7, 8, 14, 21]. Previous work has also that are grouped into sets called behavioral assemblages. been performed on the incorporation of Case-Based Each individual primitive behavior is driven by its Reasoning in the selection of behavior parameters by our perceptual input(s), a perceptual schema(s), producing its group [1, 2], on which this present research is partially own motor response. The vectorial responses from each of based, and a few others (e.g., [13]). The approach described the active schemas are added together in the behavioral in this paper, however, differs significantly from these control module as shown in figure 3, resulting in the overall previous algorithms by introducing: a novel feature behavioral output. The weighted sum of the vectors, after identification mechanism that produces spatial and temporal normalization, defines the final vector that is sent to the vectors describing the current environment; a notion of actuators. Hence, each motor schema affects the overall traversability vectors that measure the degree of behavior of the robot. traversability around a robot in a configurable number of Within MissionLab, a finite state automaton defines directions; a randomization in the case selection process to the high-level plan of a robot’s mission. Each state in the allow for the exploration of cases; and a case switching plan is one of the predefined behavioral assemblages decision tree to adaptively control case switching based on chosen to achieve the goal of a robot at this state. The case performance. This novel methodology results in very transitions between states are triggered by perceptual inputs robust performance of the robot while allowing for an easy called triggers. input and output vector space representation, Every behavioral assemblage (a state in a high-level straightforward extensions and modifications, and simple plan) is controlled by a set of parameters. Normally, these control of computational complexity depending on the parameters would be carefully chosen by a user to available computational resources and precision of sensor correspond to the task-environment the robot is expected to data. Additionally, the work presented in this paper extends inhabit. If optimal behavior for the robot is desired, states our previous work [1, 2] by incorporating the Case-Based could be split into multiple alternative states with the same Reasoning within a hybrid robot architecture and behavioral assemblage but with different sets of internal extensively evaluating the performance on both real and parameters, where each state is attuned to some particular simulated robots. environmental characteristics. This method would also Extensive research has also been conducted on require designing special perceptual triggers to detect learning robot behaviors using other methods such as neural relevant changes in environment conditions. The methods networks, genetic algorithms, Q-learning, and others [15, described in this article avoid this complexity by 16, 17]. In contrast to some of these methods, this research introducing a Case-Based Reasoning module that for a concentrates on the automatic learning of an optimal currently chosen behavioral assemblage, selects in real-time parameterization of the behaviors rather than the behaviors the set of parameters that is best suited for the current themselves. It also incorporates experiential knowledge environment. As the type of environment might change unexpectedly, the CBR module continually monitors and Behavioral Control Module re-selects and re-adapts the assemblage parameters as necessary, according to current conditions. MoveToGoal behavior V 1 A diagram of how the CBR module is integrated AvoidObstacles behaviorV2 w1 wreiatdhiinng sM einssteior niLnatob biost hs hthoew hni gihn- lefvigeul rFe S2A.- b aTsehde pslaennnsoerr Wander behavior V3 ww23 Σ MVeocttoorr and the CBR module. Based on the perceptual input, the V4 w4 BiasMove behavior same or a new state is selected. The chosen state defines a behavioral assemblage that is then passed into the Sensor Data behavioral control module. The chosen state identifier is Set of behavioral parameters also passed into the CBR module along with relevant Index information about the current robot’s goal, such as the CBR Module Case Library Case goal’s position. If the CBR module supports the current state, then based on the perceptual input, goal information Figure 3. Interaction between the behavioral control module and the state, a set of parameters for the behavioral running a GOTO behavioral assemblage and the CBR assemblage is selected from the case library and adapted to module. better fit the environment. These parameters are passed into the behavioral control module, which applies them to the current behavioral assemblage. After evaluating this III. SPATIO-TEMPORAL CASE-BASED REASONING assemblage, the motor vector is produced and supplied to This section describes a CBR module that uses a hand- the actuators for execution. If the chosen state, however, is coded library of cases. Additional details on this method not supported by the CBR module, then the behavioral can be found in [3]. The next section provides a description control module evaluates the behavioral assemblage with its of the changes required to make the CBR module learn new default parameter values as defined in the finite state cases and its implementation. Additional details on these machine. issues and the learning approach in general can be found in Currently, the CBR module supports navigational [19]. states of type GOTO, which are used for goal-directed navigation. This particular assemblage contains the A. Overview following four primitive motor schemas, as shown in figure The overall structure of the CBR module, shown in 3: MoveToGoal, Wander, AvoidObstacles and figure 4, is similar to a traditional non-learning Case-Based BiasMove. The MoveToGoal schema produces a vector Reasoning system [12]. The sensor data and goal directed towards a specified goal location from the robot's information is supplied to the Feature Identification sub- current position. The Wander schema generates a random module, which computes a spatial features vector direction vector, adding an exploration component to the representing the relevant spatial characteristics of the robot's behavior. The AvoidObstacles schema produces a environment and a temporal features vector representing the vector repelling the robot from all of the obstacles that lie relevant temporal characteristics. Both vectors are passed within some given distance from the robot. The BiasMove forward for to the best matching case selection process. schema produces a vector in a certain direction in order to During the first stage of case selection, all the cases bias the motion behavior of the robot. The CBR module in the library are searched, and the distances between their controls the following parameters: spatial feature vectors and the current environmental spatial feature vector are computed. These distances define the spatial similarities of cases with the environment. The case <Noise_Gain, Noise_Persistence, with the highest spatial similarity is the best spatially Obstacle_Sphere Obstacle_Gain, matching case. However, all the cases with a spatial MoveToGoal_Gain, Bias_Vector_Gain, similarity within some delta from the similarity of the best Bias_Vector_X, Bias_Vector_Y > Set of Spatial Features Vector Temporal Features Vector The gain parameters are the multiplicative weights of the Current Feature Spatial Features & Matching Spatially Matching environment IdentificationTemporal Features Matching (1st stage of Case Selection) (2nd stage of Case Selection) corresponding active schemas. The Noise_Persistence vectors cases parameter controls the frequency with which the random All the cases Set of Spatially and Temporally noise vector changes its direction. Obstacle_Sphere in the library Matching cases controls the distance within which the robot reacts to CBR Module Case Library Random Selection obstacles with the AvoidObstacles schema. Bias_Vector_X Process and Bias_Vector_Y specify the direction of the vector (3rd stage of Case Selection) produced by the BiasMove schema. Thus, a case in the Best Matching library consists of a set of values for the above parameters. case Case Output Parameters Case Case ready Case Best Matching or Case switching (Behavioral Assemblage Application for application Adaptation currently used case Decision tree Parameters) Figure 4. High-level structure of the CBR module. spatially matching case are also selected for the next stage selection process. These cases are called spatially matching <σ1=1; r1> cases. At the second stage of selection, all the spatially Region 1 matching cases are searched, and the distances between their temporal feature vectors and the computed Region 0 r environmental temporal feature vector are generated. These Region 2 1 D distances define the temporal similarities of these cases <σ=0; r =0> with the environment. The case with the highest temporal 2 2 <σ0=2/3; r0> Goal r similarity is the best temporally matching case. Again, all 3 the cases with a temporal similarity within some delta from the similarity of the best temporally matching case are selected for the next stage of the selection process. These Region 3 cases are spatially and temporally matching cases and constitute all the cases with close spatial and temporal <σ=2/3; r > similarity to the current environment. This set usually Clipping Circle 3 3 Figure 5. Computation of the spatial feature vector for K=4 consists of only a few cases and is often just one case. The regions. The robot is in the center of the circle. Thick lines set, however, can never be empty as the case most similar are obstacles as detected by 12 sensors evenly placed around to the environment is always selected, independently of the robot. The circled clusters of obstacles within each how dissimilar this case might be. region are the most obstructing clusters. The last phase of the selection stage is a uniformly random selection from the set of spatially and temporally each region in the order of their obstruction angles and matching cases. The idea is that these cases are all close finding a single largest sequence of obstacles such that no enough to the current environment. Their output parameter space in between these obstacles is large enough for the vectors, however, might be very different. One specific robot to traverse through. An obstacle density function pair of temporal and spatial feature vectors does not necessarily map onto an optimal solution due to possible approximation vector is then represented by K pairs <σ, r> aliasing. As a result, all of the cases found to be where σ is the degree of obstruction of a region by the most sufficiently similar to the current environment deserve at obstructing cluster in this region, and r is the distance to least a chance to be tried. this cluster. The case switching decision tree is then used to Figure 5 demonstrates an example computation of decide whether the currently applied case should still be the obstacle density. There are 12 sensors in this example, applied or should be switched to the new case selected as evenly spaced around the robot. The large circle is centered the best matching one. This protects against thrashing and on the robot and describes a clipping circle, beyond which overuse of cases. If a new case is chosen to be applied, all detected obstacles are ignored in the computation of the then it goes through the case adaptation and application density function. The encircled obstacles within each steps. At the adaptation step, a case is fine-tuned by region define the most obstructing clusters within that slightly readjusting the behavioral assemblage parameters region. Their corresponding degree of obstruction σ is then contained in the case to better fit the current environment. computed as the ratio of the angle that they obstruct within At the application step these parameters are passed on to the the region over the angle of the whole region. Thus, σ is behavioral control module outside of the CBR module for equal to 1.0 for region 1, indicating that the obstacles execution. obstruct the region completely, whereas σ is equal to 2/3 for the 0th and 3rd regions, as the obstacles leave 1/3 of the B. Technical Details angle in those regions free for traversing. Region 2 has σ 1) Feature Identification Step equal to 0.0 since there are no obstacles detected within the In this step spatial and temporal feature vectors are region's clipping circle. Thus, this whole region is available produced based on current environment data. This data for traversing. includes sensor readings and the goal position. The sensor Figure 5 also shows the distance of the robot to the data are distances to obstacles measured by the robot’s goal, D, which is the first element of the spatial feature sonar or laser sensors. vector. The number of regions K is determined based on The spatial feature vector consists of two elements: a the desired computational complexity and the resolution of distance D from the robot to the goal and a sub-vector the sensor data. If K is equal to the number of sensors on which represents an approximation of the obstacle density the robot, then the obstacle density function is the actual function around the robot and is computed as follows. The raw sensor data clipped by the clipping region. Thus, in the space around the robot is divided into K angular regions, as above example setting K to more than four might not yield shown in figure 5. The regions are always taken in such a any benefit as there are only three sensors per region way that the bisector of the 0th region is directed toward the anyway. On the other hand, setting K to a larger value goal of the robot. Within each region the cluster of might make sense if a robot uses a high resolution sensor obstacles that obstructs the region most of all is found. This such as a laser range finder. can be done, for example, by sorting all obstacles within The temporal feature vector contains two scalar elements: a short-term relative motion R and a long-term s relative motion R. The short- and long-term relative where D is the distance to the goal from equation (3), D l min motion measures represent short- and long-term velocities and D are the minimum and maximum thresholds, max of the robot, respectively, relative to the maximum possible respectively, for considering traversability in a region, and velocity of the robot, and are computed as shown in <σ, r> are elements of V as defined in equation (3). i i spatial formula (1). The same formula is used for the computation The idea is that D represents the circle of interest for f of both relative motion measures. However, the time traversability computation. The goal distance D limits it on window lengths used to compute average robot positions one hand, while D also limits it if the goal is overly max differ between long- and short-term relative motion distant. D is just a minimum threshold to prevent zero min computations as shown in the formula (2). radius circles. The traversability measure f ranges from 0 i Pos −Pos to 1. It is proportional to the degree of obstruction σi of the R = i,longterm i,shortterm , for i = s,l (1) most obstructing cluster in the region and the distance r at i i N*MaxVel which this cluster is present in the region. Thus, if a cluster where N is the normalization constant, MaxVel is the of obstacles is extremely close to the robot and it obstructs maximum robot velocity, and Pos and Pos are the whole region, then the region’s traversability measure f i,longterm i,shortterm average positions of the robot over long- and short-term becomes 0. If, however, obstacles obstruct the region time windows, respectively, and are updated according to minimally, or they are all beyond the circle of interest with formula (2) every time the CBR module is called. radius of Df, then the traversability measure f approaches 1. Pos = a *Posold +(1−a )*NewPos (2) To avoid large changes in the traversability vector i,j i,j i,j i,j for environment Fenv due to noise in sensor data, the vector for i = s,l and j = shortterm,longterm is passed through the smoothing filter given in formula (5). The coefficient b is chosen such as to have a decay time on where NewPos is a new current position of the robot, and the order of 5 to 10 sensor readings. the filter coefficient a is dependent on whether Pos , Pos , Pos or Pos is computed.s ,s hTorhtteurms, f env =b* f +(1−b)* f env,old (5) s,longterm l,shortterm l,longterm i i i for example, as,shortterm is set to a coefficient with decay time where fi is computed according to formula (4) based on the of five time cycles, whereas al,longterm is set to a coefficient spatial vector for the current environment and fienv,old is fienv with decay time of 600 time cycles. from the previous execution of the CBR module. Formula (3) summarizes the form of the spatial As every case in the library is represented by a and temporal vectors. traversability vector F and the current environment is  D  represented by a traversability vector Fenv, these vectors can Vspatial =  σ0M, r0  Vtemporal = RRsl (3) bathneed u wtsheeedi g ethnotve adirs ossneusmms e tonhfte. s sqTpuhaaetri easdlp aseitrimraoli rlsasi rmbiteiieltawsr eibteeynt w iase cceoansm ethp aeun tcedad ts heaess σ , r  environment traversability vectors. There is significantly  K−1 K−1 more weight given to the regions directed more towards the These two vectors define input features (indices) for cases goal. This assures that, for example, if a case and an and are passed in this form into the best matching case environment have clear-to-goal situations in the 0th region, selection described next. then the environment is more similar to this case than to any other case that might have other very similar regions, 2) Case Selection Process but does not have the clear-to-goal situation in the 0th The best matching case selection is broken into three region. Formula (6) shows the computation of spatial steps. In the first step, a set of spatially matching cases is similarity S. found. All the cases in the library contain their own spatial K−1 and temporal feature vectors. The similarity between the ∑w *(f − f env)2 i i i spatial feature vector for a case and an environment is used S =1− i=0 (6) to assess the degree to which the case matches the K−1 environment spatially. In order for spatial feature vectors to ∑w i be comparable, however, they are first transformed into i=0 traversability vectors. The traversability vector F where W is the vector of weights for each region, F is the eliminates actual distances by representing the degree to traversability vector of a case, and Fenv is the traversability which each region can be traversed. Formula (4) presents vector of the current environment. Thus, the perfect match the transformation from a spatial vector into the is represented by S equal to 1, and the maximum difference traversability vector. by S equal to 0.  f  After the spatially based case selection, the set of F = 0 , f = min(1, 1−σ*Df −ri), (4) spatially matched cases contains all the cases with spatial  M  i i D similarity S within some delta from the spatial similarity of fk−1 f the best spatially matching case. The best spatially matching case is defined as the case with the highest spatial D =max(D , min(D , D)) f min max matching similarity with the current environment. Similarly, at the second selection step the temporal similarity and the current case's spatial similarity is checked similarity with the current environment is computed for all against some threshold S . If the two conditions are diff the cases in the set of spatially matched cases according to satisfied, then the current case continues to be used. The formula (7). intent is that the current case should not be thrown away too w *(R −Renv)2 +w *(R −Renv)2 soon, unless the environment became significantly different S =1− l l l s s s (7) from what it was when the current case was initially w +w selected. If one or both of the conditions are unsatisfied, or l s where w and w are long- and short-term relative motion if the case was applied for longer than the suggested l s measure weights, < R, R > is a temporal vector for a case, threshold, the decision-making proceeds to check the long- s l and < Rsenv, Rlenv > is a temporal vector for the current term relative motion measure Rl. If it is larger than some environment. The long-term relative motion measure is threshold, then the case is more likely to be performing well given more weight indicating its greater importance in the and the short-term relative motion measure Rs should be assessment of temporal similarities. compared against a low threshold Rs low. If the short-term The best temporally matching case is the case that relative measure is also higher than the low threshold, it has the highest temporal similarity with the environment. suggests that the current case performs well and it is All cases with a temporal similarity within some delta from exchanged for the new one only if its spatial similarity is the temporal similarity of the best temporal matching case very different from the environment or a much more similar are selected from the set of spatially matched cases for the case is found. Otherwise, the current case in use remains next selection stage. Thus, after the temporal-based unchanged. If, on the other hand, the short-term relative selection process, the set of matched cases contains the motion measure is less than the low threshold, then the case cases that are both spatially and temporally similar to the is switched to the new one. Going back to the long-term environment. relative measure check, if it is smaller than the Rl threshold, Finally, at the third and the last step of the case then the case might not be performing well and, therefore, selection process, randomness is added to the selection the short-term relative measure is compared against a process. Namely, one case from the set of matched cases is stricter threshold Rs threshold. If it falls below this selected uniformly at random. This selected case is threshold, then the new case is selected. Otherwise, the declared to be the best matching case with the current case spatial similarity is compared against a strict threshold environment. Shigh threshold. If the similarity is less, then the new case is selected, otherwise the current case is given more time to 3) Case Switching Decision Tree exert itself. At this step the decision is made as to whether the best matching case or the currently applied case should be used 4) Case Adaptation until the next call to the CBR module. This decision is If it is decided at the previous step to keep the based upon a number of characteristics describing the current case, then this step (case adaptation) is not potential capabilities of the best matching case and the executed. If it is decided, however, to apply a new case, current case. The decision tree is shown in figure 6. then the new case needs to be fine-tuned to the current At the root of the tree, the time the current case was environment. applied is checked against some threshold CaseTime that is The adaptation algorithm is very simple: specific to each case in the library. If the current case was applied for less time than the threshold, then the spatial X = (Rl adaptthreshold + Rs adaptthreshold) / (Rl + Rs); similarity of the current case is checked against threshold Y = Rl adaptthreshold / Rl ; Z = R threshold) / R; S , and the difference between the new case's spatial s adapt s low If (R < R threshold and R < R threshold) l l adapt s s adapt Case Applied Time > CaseTime Threshold Increase Noise_Gain proportionally to X; Increase CaseTime proportionally to X; Yes No Limit Noise_Gain and CaseTime from above; Rl > Rl threshold S > S threshold Else if (Rl < Rl adaptthreshold) current low Increase Noise_Gain proportionally to Y; and Yes No S -S < S threshold Increase CaseTime proportionally to X; new current diff Limit Noise_Gain and CaseTime from above; Rs > Rs low threshold Rs > Rs threshold No Yes Else if (Rs < Rs adaptthreshold) Increase Noise_Gain proportionally to Z; Yes No Yes No Limit Noise_Gain from above; S > S threshold S > S threshold New Case Current Case End; current alnowd New Case current high S -S < S threshold The adaptation algorithm looks at both the long- new current diff Yes No term and short-term motion measures of the robot and Yes No increases the level of noise (random motion) in the robot’s behavior if any of the measures fall below the Current Case New Case Current Case New Case corresponding threshold. The amount to which the noise is Figure 6. Case switching decision tree. increased is proportional to how long the robot's progress a) Environment characteristics at A: Environment characteristics at B: Spatial Vector: Spatial Vector: D (goal distance) = 300 D (goal distance) = 275 density distance density distance Region 0: σ = 0.31; r = 5.13 Region 0: σ = 1.00; r = 0.11 0 0 0 0 Region 1: σ = 0.71; r = 2.83 Region 1: σ = 0.79; r = 0.11 1 1 1 1 Region 2: σ = 0.36; r = 7.03 Region 2: σ = 0.38; r = 0.12 2 2 2 2 Region 3: σ = 0.54; r = 2.80 Region 3: σ = 1.00; r = 0.11 3 3 3 3 Temporal Vector: Temporal Vector: (0 - min, 1 - max) (0 - min, 1 - max) ShortTerm_Motion R = 1.000 ShortTerm_Motion R = 0.010 s s LongTerm_Motion R = 0.931 LongTerm_Motion R = 1.000 l l Traversability Vector: Traversability Vector: (0-untraversable, 1- excellent) (0-untraversable, 1- excellent) f =0.92 f =0.58 f =1.00 f=0.68 f =0.02 f =0.22 f =0.63 f=0.02 0 1 2 3 0 1 2 3 Point A b) Case 1 used at A: Case 2 used at B: CLEARGOAL FRONTOBSTRUCTED_SHORTTERM Case 1 Spatial Vector: Spatial Vector: ff==00..5588 D (goal distance) = 5 D (goal distance) = 5 11 density distance density distance ff22==11..00 ff00==00..9922 Region 0: σ = 0.00; r = 0.00 Region 0: σ = 1.00; r = 1.00 0 0 0 0 ff33==00..6688 Region 1: σ1 = 0.00; r1 = 0.00 Region 1: σ1 = 0.80; r1 = 1.00 Region 2: σ = 0.00; r = 0.00 Region 2: σ = 0.00; r = 1.00 2 2 2 2 Region 3: σ = 0.00; r = 0.00 Region 3: σ = 0.80; r = 1.00 Point B 3 3 3 3 Temporal Vector: Temporal Vector: Case 2 (0 - min, 1 - max) (0 - min, 1 - max) ff11==00..2222 LShoonrgtTTeerrmm__MMoottiioonn RRs == 01..700000 LShoonrgtTTeerrmm__MMoottiioonn RRs == 00..600000 ff22==00..6633 ff00==00..0022 Traversability Vectolr: Traversability Vectolr: ff==00..0022 (0-untraversable, 1- excellent) (0-untraversable, 1- excellent) 33 f =1.00 f =1.00 f =1.00 f=1.00 f =0.14 f =0.32 f =1.00 f=0.32 0 1 2 3 0 1 2 3 Figure 7. Robot runs in a simulated environment. Top: without Case Output Parameters: Case Output Parameters: CBR module; Bottom: with CBR module. The circles on the left MoveToGoal_Gain = 2.00 MoveToGoal_Gain = 0.10 show the values of the traversability vectors that correspond to Noise_Gain = 0.00 Noise_Gain = 0.02 Noise_Persistence = 10 Noise_Persistence = 10 the indicated points in the environment. Obstacle_Gain = 2.00 Obstacle_Gain = 0.80 Obstacle_Sphere = 0.50 Obstacle_Sphere = 1.50 was impeded as determined by the long- and short-term Bias_Vector_X = 0.00 Bias_Vector_X = -1.00 Bias_Vector_Y = 0.00 Bias_Vector_Y = 0.70 motion measures. If the robot lacks progress for a Bias_Vector_Gain = 0.00 Bias_Vector_Gain = 0.70 sufficiently long time period enough, then the long-term CaseTime = 3.0 CaseTime = 2.0 motion measure R falls below its threshold and the l Figure 8. a) Environment features at points A (left) and B CaseTime threshold is also increased to ensure that the new (right); b) Cases used at point A (left) and point B (right). case is applied long enough given the current environment. The adaptation of the case is followed by its mission completion time is 23.4% less (figure 7, bottom). application, which simply extracts the behavioral For example, during the part of the run before the local assemblage parameters from the case and passes them to minimum produced by two obstacles is encountered (point the actively executing behavioral control module within the A in figure 7, bottom) the robot uses case 1, called MissionLab system. CLEARGOAL case (figure 8b, left). In this case no noise is present in the robot behavior making the trajectory a C. An Example of Operation straight line. When the robot approaches the two obstacles Figure 7 shows two runs of a simulated robot with (point B in figure 7, bottom), it switches to case 2 called and without the CBR module within MissionLab, which FRONTOBSTRUCTED_SHORTTERM (figure 8b, right). provides a simulator as well as logging capabilities, making In this case, the gains of the Wander and BiasMove the collection of the required statistical data easy. Black schemas and Obstacle_Sphere are increased. This ensures dots of various sizes represent obstacles and the curved line that the robot quickly gets out of the local minima and then across the picture depicts the trajectory of the robot. The proceeds toward the goal, switching back to the mission area is 350 by 350 meters. CLEARGOAL case. During the entire run the same behavioral assemblage is used. However, as the environment changes IV. LEARNING BEHAVIORAL PARAMETERIZATION from one type to another, the CBR module re-selects the set of parameters that control the behavioral assemblage. As a result, a robot that does not use the CBR module requires a A. Technical Details higher level of noise in its behavior in order to complete the This section provides an overview of the learning mission (figure 7, top). If, however, the CBR module is CBR module, as shown in figure 9, emphasizing the enabled, then the Wander behavior is rarely used, and the extensions that were made to the non-learning CBR distance traveled by the robot is 23.7% shorter, whereas the algorithm described previously. First, as before, the sensor CCuurrrreenntt FFeeaattuurree SSppaattiiaall && TTeemmppoorraall SSppaattiiaall FFeeaattuurreess SSeett ooff SSppaattiiaallllyy TTeemmppoorraall FFeeaattuurreess PPP(((ssseeellleeeccctttiiiooonnn))) PPP(((ssseeellleeeccctttiiiooonnn))) PP((sseelleeccttiioonn)) EEnnvviirroonnmmeenntt IIddeennttiiffiiccaattiioonn FFeeaattuurree VVeeccttoorrss VVeeccttoorr MMaattcchhiinngg MMaattcchhiinngg ccaasseess VVeeccttoorr MMaattcchhiinngg 111...000 111...000 11..00 AAllll tthhee ccaasseess iinn tthhee lliibbrraarryy SSppaattiiaallllyySS &&eett TTooeeff mmppoorraallllyy ssseeetttmmm oooaaafff tttssscccppphhhaaaiiitttnnniiigggaaallllllyyy ss&&eett ttooeeffmm ssppppooaarrttiiaaaallllllllyyyy mmaabbttcceehhssttii nngg MMaattcchhiinngg ccaasseess cccaaassseeesss::: mmaattcchhiinngg ccaassee:: LLeeaarrnniinngg {{{CCC111,,, CCC222,,, CCC444}}} ccaasseess:: CC11 AABpBpssaaeessrrhheeaaaammmmvvbbeeiioottlleeaarrrraaggssllee AAppppCCllaaiiccssaaeett iioonn CCBBRR MMoodduullee wwiittKKhh LL aaccaaddaassjjssttuuee ssssCCtteeaaddssee MMeemmoorryyKKLL ccaaaassssttee ss bbssiiaaRRppssaaaaeettnnddiissaadd iillbboomm yyaammiinnaa llcc nnddaaSSaaddrr sseettii eeeettlliiee mmeessccssuuppttcciiooooccrrnneeaa ssll ss 000...000 ssspppaaatttiiiaaalllCCC sss555iiimmmCCC444iiilllaaaCCCrrr333iiitttCCCyyy222111CCC...000111 000...000 tteemmppoorraall ssiimmCCC222iillCCCaa111rriitt111yyCCC...000444 {{CC11,,ww,,CCee44ii}}gghhss00iittmmee..00ddii llssaauurriimmttiiee oossff aa ssnnppddaa CCttccii44aaaallss eeaa nnCCssuudd11cc ttcceeeemmssssppoorraall rrCCeeffaaaaooddssrree yy ppeerrhhffooiissrrttmmoorraayynn ccee BBeesstt MMccaaaasseettcchhiinngg Figure 10. Case selection process. aapppplliiccaattiioonn AAddCCaappaattssaaeett iioonn MMNNaaBBeeccttwwccaaeesshhss eettoioi nnrrgg NNeewwii ffcc aannsseeeeee ddccrreeeeddaattiioonn MMaaBBccttaacceehhsssseettii nngg PPEEeeOOvvrrllffaaddooll rruuCCmmaaaattaaiissoonneenncc ee BBeesstt MMccaaaasstteecchhiinngg CCDDaaeesscceeii ssSSiiwwoonniitt ccTThhrriieenneegg of being selected than C . In this particular example, C is 4 1 Figure 9. High-level structure of the learning CBR module. indeed selected as the best matching case. Once the case is selected, as before, the case data and goal information are provided to the Feature switching decision tree decides whether to continue to use Identification sub-module that operates identically to its the currently applied case or switch onto the selected best non-learning CBR module counterpart. The resulting spatial matching case. If the switching decision tree says that the and temporal feature vectors are then passed to the best currently applied case should remain active, then nothing matching case selection process. else needs to be done in this cycle of the CBR module. Otherwise, the CBR module continues its execution with 1) Case Selection the evaluation of the currently applied case performance as As before, at the spatial case selection step the described below. spatial similarity between each case in the library and the current environment is computed as the weighted Euclidean 2) Old Case Performance Evaluation distance between the case and environmental traversability The velocity of the robot relative to the goal, that vectors. Now, however, instead of selecting all the cases is the speed with which the robot is approaching its goal, is that have a spatial similarity within some delta from the used as the main criteria for the evaluation of case similarity of the best spatially matching case, the cases are performance. The pseudo code for the performance selected at random, with their probability of being selected evaluation of a case C follows: proportional (according to an exponential function) to the difference between their spatial similarity and the spatial Compute velocity V(C) according to Equation (8) similarity of the best spatially matching case. Figure 10 If (V(C) ≤ 0 and C was applied last) illustrates this case selection process. Case C1 is the best //delayed reinforcement spatially matching case and has a 100 percent probability of Postpone the evaluation of C until another K-1 cases are applied or C is selected for application (whichever being selected to the set of spatially matching cases. Cases comes first) C and C are also selected as a result of random case 2 4 else selection biased by their spatial similarities. The idea if (V(C)> µ⋅V (C) and V(C)>0) //µ=0.9 max behind adding such randomness to the case selection I(C) = max(1, I(C) + 1); process is to bias the exploration of cases by their else I(C) = I(C) – 1; similarities with the environment. Similarly, at the temporal end case selection stage, the cases that were selected as spatially I(C)=min(I , I(C)); //limit I(C);I =100 max max matching cases go through the random selection process Update V (C) according to Equation (2) max with the probability of being selected biased by the if(C was applied last) if(V(C)> µ⋅V (C) and V(C)>0) differences between their temporal similarity and the max Increase S(C) by ∆ proportional to I(C); temporal similarity of the best temporally matching case. else Thus, in the example in figure 10, case C4 is the best Decrease S(C) by ∆; temporally matching case and therefore is selected for the end else next selection step. Case C is also selected at random for 1 if (Robot advanced towards its goal) the next selection step whereas C2 is not. The cases that Increase S(C) by ∆ proportional to I(C); pass these two selection stages are also called spatially and else temporally matching cases and are forwarded to the last Decrease S(C) by ∆; case selection stage. end end At the last selection step just one case is selected at end random with a probability of being selected proportional to the weighted sum of case spatial similarity, temporal Since for some cases the task is to get the robot similarity, and case success. The case success is a scalar closer to the goal, while for other cases the task is to get the value that reflects the performance of the case, and is robot out of local minima such as “box canyons” created by described below. Thus, for the example shown in figure 10, obstacles, the robot's velocity relative to the goal may not C has a higher weighted sum of spatial and temporal always be the best evaluation function for case 1 similarities and success, and therefore has a higher chance performance. Instead, a delayed evaluation of the case performance may be necessary. For this reason the information on the last K applied cases is kept. K defines a not advanced at all. In either case the increase in the case learning horizon and in this work is chosen to be 2. Thus, success is proportional to the number of times the when a new case is about to be applied, the performance application of the case resulted in its performance evaluation function is called on each of the following cases: improvement I(C). This adds momentum to the the case that was applied last; the case that was applied K convergence of case success. The more there are recent cases ago and was not yet evaluated because the evaluation case improvements, the faster the case success approaches was postponed; and the case that was applied some time its maximum value of 1.0, indicating full convergence of previously that was not yet evaluated and is the case the case. The case success is used in case selection to bias selected for a new application. At the very beginning of the the selection process, and in case adaptation to control the performance evaluation a check is done: if a case C was just magnitude of the adaptation vector. It will be discussed applied and the robot did not advance towards its goal as a further below. result of the case application, then the case performance evaluation is postponed. The robot did not advance if its 3) Case Creation Decision average velocity V(C) relative to its goal from the time just At this step, a decision is made whether to create a before case C was applied up until the current time is not new case or keep and adapt the case that was selected for positive. Otherwise, the performance evaluation proceeds the application. This decision is made based on the further. weighted sum of the temporal and spatial similarities of the Each case has a number of variables that represent selected case with the environment and on the success of the recent performance of the case and need to be updated the selected case. If the success of the selected case is high in the performance evaluation routine. The average then it needs to be very similar to the environment, mainly velocity V(C) of the robot relative to the goal for case C is spatially, in order for this case to be adapted and applied. computed as follows: This prevents making the case success diverge based on g −g environments that do not correspond to the case. If the case V(C)= tb(C) tcurr (8) success is low, then the case similarity need not be very t −t (C) curr b close to the environment and still the case is adapted and applied. In any event, the size of the library is limited (for where t (C) is the time before the application of case C, t this work a limit of 10 cases was used) and therefore if the b curr is the current time and g is the distance to the goal at time t. library is already full then the selected case is adapted and t One of the variables maintained by each case describing applied. case performance is V (C): the maximum average velocity If it is decided that a new case should be created, max of the robot relative to the goal as a result of the application then the new case is initialized with the same output of case C. This velocity is updated after every performance parameters (behavioral parameters) as the selected case but evaluation of case C. Equation 9 is a form of “maximum input parameters (spatial and temporal feature vectors) are tracker” in which V (C) very slowly decreases whenever initialized to the spatial and temporal feature vectors of the max it is larger than V(C) and instantaneously jumps to V(C) current environment. The new case is saved to the library whenever V (C) is smaller than V(C): and then passed to the adaptation step. If no new case is max created then the selected case is passed directly to the V (C)=max( V(C), η⋅V (C)+(1−η)⋅V(C)) (9) adaptation step. max max 4) Case Adaptation where η is a large time constant, here chosen to be 0.99. Independent of whether the case to be applied is an However, before V (C) is updated, a decision is max old case or was newly created, the case still goes through made on whether the case resulted in performance the adaptation process. Every case C in the library also improvement or not. The performance is considered to maintains an adaptation vector A(C) that was last used to improve if V(C)>µ⋅V (C) and V(C)>0, where µ is close to max adapt the case output parameters. If the case was just 1. Thus, the case performance is considered to be an created then the adaptation vector is set to a randomly improvement not only when the velocity is higher than it generated vector. The adaptation of a case happens in two has ever been before, but also when the high velocity is steps. First, based on the case's recent performance, the reasonably sustained as a result of the case's application. adaptation vector is used to adapt the case C output The variable I(C) maintains the number of the last case parameter vector O(C) as follows: performance improvements and is used in the adaptation step to search for the adaptation vector direction. if (I(C) ≤ 0) Finally, the case success S(C) is also updated. If //change the adaptation direction the performance evaluation is not postponed, then the case A(C) = – λ⋅ A(C) + ν⋅ R; end success is increased if the case performance improved, //adapt where the performance improvement is defined by the same O(C) = O(C) + A(C); formula as before, and is decreased otherwise. If, however, the case evaluation was postponed, then the case success is If the case improvement I(C) does not show increased if the robot advanced sufficiently towards its goal evidence that the case was improved by the last series of after the case was applied and is decreased if the robot has adaptations, then the adaptation vector direction is reversed,

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.