T R R ECHNICAL ESEARCH EPORT Stochastic Language-based Motion Control by Sean Andersson, Dimitrios Hristu-Varsakelis CDCSS TR 2003-1 (ISR TR 2003-31) + CENTER FOR DYNAMICS C D - AND CONTROL OF S SMART STRUCTURES The Center for Dynamics and Control of Smart Structures (CDCSS) is a joint Harvard University, Boston University, University of Maryland center, supported by the Army Research Office under the ODDR&E MURI97 Program Grant No. DAAG55-97-1-0114 (through Harvard University). This document is a technical report in the CDCSS series originating at the University of Maryland. Web site http://www.isr.umd.edu/CDCSS/cdcss.html Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE 3. DATES COVERED 2003 2. REPORT TYPE - 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Stochastic Language-based Motion Control 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION Army Research Office,PO Box 12211,Research Triangle Park,NC,27709 REPORT NUMBER 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S) 11. SPONSOR/MONITOR’S REPORT NUMBER(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution unlimited 13. SUPPLEMENTARY NOTES The original document contains color images. 14. ABSTRACT see report 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18. NUMBER 19a. NAME OF ABSTRACT OF PAGES RESPONSIBLE PERSON a. REPORT b. ABSTRACT c. THIS PAGE 7 unclassified unclassified unclassified Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 Stochastic Language-based Motion Control Sean Andersson Dimitrios Hristu-Varsakelis Electrical and Computer Engineering and Mechanical Engineering and Institute for Systems Research Institute for Systems Research University of Maryland University of Maryland College Park, MD 20742 College Park, MD 20742 [email protected] [email protected] Abstract—In this work we present an efficient environment not only for expressing control tasks but also for describing representation based on the use of landmarks and language- the environment. In particular, [4] proposed representing based motion programs. The approach is targeted towards the environment of a language-driven dynamical system applications involving expansive, imprecisely known terrain by means of landmarks, linked together not by geometric without a single global map. To handle the uncertainty inher- ent in real-world applications a partially-observed controlled relations but by the feedback control laws required to move Markov chain structure is used in which the state space is from one location to another. This gives rise to a directed the set of landmarks and the control space is a set of motion graph, with nodes corresponding to landmarks and edges programs. Using dynamic programming, we derive an optimal beingidentifiedwithcontrolprogramsencodedinthemotion controller to maximize the probability of arriving at a desired description language MDLe [2], [3]. This representation of landmark after a finite number of steps. A simple simulation is presented to illustrate the approach. the world makes contact with studies on human and animal navigation (see, e.g., [5]) that suggest the existence of two I. INTRODUCTION navigation systems used by mammals: a local response sys- Assystemstheoryreachesintothedomainofmulti-modal tem and a global place-knowledge system. In simple terms, systems,itrevealsacomplexityofbehaviorthatisnotusually whenthegoallocationis visiblelocalinformationis usedto encountered in classical models. This complexity is part of navigate; when moving to locations which are not visible, what motivates research in the subject but at the same time stored knowledge of the spatial structure of the world is it gives rise to new challenges when it comes to answering used.Althoughlandmark-basednavigationhasbeenexplored basic system-theoretic questions in the new setting. This extensivelybyotherauthorsforlocalization[6], [7],naviga- point is perhaps most easily illustrated in the following tion [8], [9] and descriptions of “large-scale” environments example: knowing that a mobile robot or other autonomous [10], the novelty of the approach in [4] is that geometric system is controllable (by checking the properties of a relationships and global coordinates are abandoned in favor governingdifferentialequation)doesnottelluswhetheritis of language-based instructions that can be interpreted down possible (or how) to steer the robot between two locations to control laws suitable for driving a differential equation- in a reasonably complex environment. The reasons for this based model. This results in a parsimonious description of difficulty are twofold. First, the environment is at best only theworld,withouttheneedforglobalgeometryandwithout locallystatespace-like,withregionsthatareuninterestingor mapping areas that are easily navigable or uninteresting. shouldbeavoided.Second,acomplexenvironmentmakesit In this work we use [4] as a point of departure to study difficult to design control laws, especially if one insists on language-drivencontrolandnavigationinastochasticsetting. doing so at the level of sensors and actuators. We exploit classical results on partially-observed controlled Efforts to address the latter challenge have included re- Markov chains to obtain control programs (more precisely search on the “motion description languages” MDL and stringsinaformallanguage)thatareoptimalinthepresence MDLe [1], [2], [3] which provide a means for abstracting of uncertainty associated with the environment, the sensors fromthelow-leveldetails(e.g.kinematicsanddynamics)ofa and actuators of the system under consideration and with controlsystem. Control programswritten in these languages the precisionof the languageitself. The nextsection gives a combine feedback control laws and logic into strings that brief description of MDLe. Section III presents the control have meaning almost independently of the underlying sys- problem we are concerned with and describes its Markov tem,muchlikedesktopsoftwareachievesalevelofhardware chainrepresentation.InSectionIVwederivecontrolpolicies independence by relying on appropriate device drivers. thatareoptimalformovingtoadesiredlandmark.SectionV Thedesignofamotiondescriptionlanguageshapestheset contains simulation results that illustrate our approach. of control laws that can be formulated, as does the choice II. MDLE of a representation for the environment. After all, feedback control is a map between observations and inputs. Perhaps The starting point for MDLe is an underlying physical thenitshouldcomeasnosurprisethatlanguagecanbeuseful system such as a mobile robot with a set of sensors and actuators for which we wish to specify a motion control donothaveglobalgeographicalinformation,we cannotrely program. The system is assumed to be governed by a on any map. In the absence of sensing and actuator noise, differential equation of the form one can replace geometric relationships between landmarks withinstructionsonhowtogetfromonetotheother[4].The x˙ =f(x)+G(x)u; y =h(x)∈IRp (1) environmentisthenrepresentedbyadirectedgraphinwhich where x(·) : IR+ → IRn is the state of the system, u(·) : the nodes are the landmarks and edges are associated with IRp×IR+ →IRm isacontrollawofthetypeu=u(t,h(x)), MDLe plans. In orderto be practical, this approachmust be andG is a matrixwhosecolumnsgi arevectorfieldsin IRn. modifiedawayfromitsdeterministicsetting,sincewecannot The simplest element of MDLe is the atom, defined to be a guarantee that a given plan will perform as expected every tripleoftheformσ =(u,ξ,T),whereuisasdefinedearlier, time due to noisy sensing and control and environmental ξ :IRp →{0,1} is a boolean interrupt function defined on uncertainty. thespaceofoutputsfrompsensors,andT ∈IR+ denotesthe To handle this uncertainty, we generalize the directed valueoftime(measuredfromthetimetheatomisinitiated)at graph representation to a partially-observed controlled whichtheatomwill expire.Toevaluatetheatomis to apply Markovchain.GivenacollectionofmMDLeplansdenoted the controllaw u until the interruptξ is low or untilT units by G = {Γ1,...,Γm}, we associate to each plan a Markov of time have elapsed. Atoms can be composedinto a string, matrix, A(k), specifyingthe transition probabilitiesbetween calleda behavior,that carriesits owninterruptfunctionand landmarks; thus [A(k)]ij = pij(k) is the probability of timer. Behaviors can in turn be composed to form higher- ending at landmark Lj given that we begin at landmark level strings (called partial plans) and so on. We will use Li and execute plan Γk. At the completion of each plan an the termplantoreferto a genericMDLe stringindependent observationis made,givingus informationaboutthe current of the number of nested levels it contains. For more details landmark. on the language, including example programs, see [3]. It is important to note that this choice of representation places some restrictions on the set of landmarks and plans. III. LANDMARK-BASEDNAVIGATIONAMID Since the system does not know with certainty which land- UNCERTAINTY mark it is on at the completion of a plan, the effect of We assume that there is a set, L = {L1,...,Ln}, of applying each plan from each landmark must be known; “interesting” or useful geographicallocations which we call this is precisely the meaning of the Markov matrix A(k). landmarks.Theselandmarkscantakevariousforms,suchas Furthermore,eachplanmustguaranteethatuponcompletion GPScoordinates,visualcues,orevidencegridmaps[11].In the system is at some landmark. A simple way of accom- general,however,theyare identifiedwith local geographical plishing this is, of course, to completely tile the world with informationonly;thatistheyarenotreferencedtoanyglobal landmarks. A more economical approach, however, is to coordinate system. We associate to each landmark a sensor chooseplanscarefully.Forexample,inanofficeenvironment signature as follows. Let s(t) ∈ IRp be the sensor data it is possible to create plans which ensure the system will collected at time t and let L be the current landmark taking always end up inside an office rather than in a hallway, values in {Li}∪∅. Then though due to changes in the environment such as people openingorclosingtheirdoorstheparticularofficecannotbe L=Li if s(t)=si(t) t∈[t0,t0+T] (2) specified with certainty. Thus, the use of feedback control where si(t), t ∈ [t0,t0 + T] is the sensor signature of lawsencodedasMDLeplansenablesasimplifieddescription the ith landmark. We do not assume these signatures to be of the environment in a manner akin to that by which unique since a robot equipped with noisy sensors may at feedbackcanreducethecomplexityofmotorprograms[12]. bestbeabletoidentifytowithinasubsetofthecollectionof IV. OPTIMALNAVIGATIONBETWEENLANDMARKS landmarks.Wethusrestrictourobservationstothecollection of equivalence classes where two landmarks are deemed In orderto use local navigationtechniquesthe robotmust equivalent if their signatures are “close” based on some know which landmark it is on. In this section, then, we metric. We refer to this set as Z = {L(cid:1)1,...,L(cid:1)p} where propose a method of finding the sequence of MDLe plans p ≤ n and each L(cid:1)i is a representative of the equivalence that drives the robot to a desired landmark with maximal class. probability,in a time-optimal manner, under the assumption We will classify navigationtasks into two categories. The thatsuchsequencesexist.Recentworkalongthese linescan firstinvolvesmotiononornearalandmark.Inthissettingthe be found in [13]. robot knows what landmarkit is on and possesses a map of The navigation problem described in Section III is nat- thenearbyterrain.Assumingtherobotcanuseits sensorsto urally discrete. To find the optimal sequence we turn to localizeitselfonthismap,navigationisinprinciplesolvedby dynamic programming (DP) [14]. The state space for the pathplanning.Inthispaperweareconcernedwithnavigation robot is the collection of landmarks L, the control space betweenlandmarkswhere,becausewehaveassumedthatwe is the collection Γ of MDLe plans, and the observation space is the collection of equivalence classes of landmarks, subject to the dynamics of (9,10). The final cost is Z. Let xk,zk,uk be the state (location), observation, and control respectively at time k and let k ∈ {0,1,...,N}. JN(PN|N) = Ex{gN(x)|IN} We assume that we are given a sensor model for the robot; (cid:6)n (cid:3) (cid:4) (cid:2) that is we know the distribution Pr(zk = j|xk = i) giving = gN(i) PN|N i =GnPN|N (13) (cid:1) i=1 us the probability of making observation Lj given that we are currently on landmark Li. Define the usual information where we have made the obvious definition for the vector vector GN. Applying one step of the DP algorithm yields (cid:1) (cid:7) (cid:8) Ik =(z0,z1,...,zk,u0,u1,··· ,uk−1) (3) JN−1(PN−1|N−1) = muinEzN JN(PN|N) and the vector of conditional probabilities (cid:6)p (cid:2) Pk|k =(p1k|k,p2k|k,...,pnk|k)(cid:2) (4) =muini=1GNPzN=iA(u)PN−1|N−1 (14) where (cid:2) indicates transpose and pjk|k = Pr(xk = j|Ik) is Thus the optimal control at the (N −1)th step is the probability of being in state Lj at time k given the (cid:6)n informationuptothecurrenttime.UsingBayesruleandthe (cid:2) assumption that the observation depends only on the state µN−1 =argmuin GNPzN=iA(u)PN−1|N−1 (15) i=1 and not on the previous information or current control we have which simply minimizes the expectedvalue of the cost over pjk+1|k+1 = Pr(xk+1 =j|Ik+1) tshteepfiwnealfionbdsethrveatNion−. 2CasrtraygiengcotshtetoDPbealgorithm one more = (cid:2)niP=r1(Pzkr+(z1k|x+k1+|x1k=+1j=)Pir)(Pxrk(+x1k+=1j=|Iki|,Iukk,)uk) (5) JN−2(cid:9)PN−2|N−2(cid:10)= muinEzN−1(cid:7)JN−1(cid:9)PN−1|N−1(cid:10)(cid:8) Now define (cid:6)p (cid:6)p (cid:2) Pk+1|k =A(u)Pk|k (6) = muin GNPzN=i1A(µN−1) (cid:3) (cid:4) i1=1i2=1 so that Pr(xk+1 = j|Ik,uk) = Pk+1|k j. For ease of ·PzN−1=i2A(u)PN−2|N−2 (16) notation we also define the diagonal matrix The optimal control at time N −2 is thus Pz =diag(Pr(z|xk =L1),...,Pr(z|xk =Ln)) (7) (cid:6)p (cid:6)p andthevectore=(1,1,...,1)(cid:2).Usingthisnotationequation (cid:2) (5) has the form µN−2 =argmuin GNPzN=i1A(µN−1) (cid:3) (cid:4) i1=1i2=1 j Pr(zk+1|xk+1 =j) Pk+1|k j ·PzN−1=i2A(u)PN−2|N−2 (17) p = (8) k+1|k+1 e(cid:2)Pzk+1Pk+1|k which is the control which minimizes the expected value of We can then write the update equation for the conditional thefinalcostoverthelasttwoobservations.Thegeneralcase probability as the two step iteration given by is given by the following theorem. Theorem 4.1: For k = N −1,··· ,0 the optimal cost to Pk+1|k = A(uk)Pk|k (9) go is given by Pk+1|k+1 = eP(cid:2)Pzzkk++11PPkk++11|k|k (10) Jk(cid:9)Pk|k(cid:10)=muin(cid:6)p (cid:6)p ···(cid:6)p G(cid:2)NPzn=i1 where P0|0 is a known initial distribution. To proceed with i1=1i2=1 iN−k=1 the DP algorithmwe mustchoosethe costfunctionwe wish ·A(µN−1)PzN−1=i2A(µN−2)···Pzk+1=iN−kA(u)Pk|k to minimize. We first choose to maximize the probability of arrivingatadesiredlandmark,denotedd,at timeN.Tothis end define the function The usual corollary yields the optimal control policy. (cid:5) −1 if x=d Corollary 4.2: The optimal control at time k is gN(x)= (11) 1 otherwise (cid:6)p (cid:6)p (cid:6)p We denote a policy as π = {µ0,µ1,...,µN} where µk is µk =argmuin ··· G(cid:2)NPzn=i1A(µN−1) the controlfunctionat time k. The cost functionwe wish to i1=1i2=1 iN−k=1 minimize is ·PzN−1=i2A(µN−2)···Pzk+1=iN−kA(u)Pk|k Jπ(P0|0)=Ezk,k=1,2,...,N{Ex{gN(xN)|IN}} (12) Asimpleextensionallowsustomaximizethisprobability equations are then integrated forward by one time step and of arriving at the desired landmark in the minimum amount the sensors evaluated by intersecting each ray with the set of time. To this end we define the functions of polygons modeling the environment. The process then (cid:5) −bk xk =d repeats until the list of atoms is exhausted. gk(x)= (18) The office-like environment used for these simulations bk otherwise is shown in Figure 1 together with a virtual robot. Three and seek to minimize the cost function given by (cid:11) (cid:12) N(cid:6)−1 10 (cid:2) (cid:2) Jπ(P0|0)=Ezk,k=1,...,N GNPN|N + GkPk|k (19) k=0 8 L L L 2 3 The DP solution is given by the following theorem. 1 Theorem 4.3: For k = N −1,··· ,0 the optimal cost to 6 go is given by (cid:9) (cid:10) (cid:3) 4 (cid:2) Jk Pk|k =min GkPk|k u (cid:6)p (cid:9) (cid:10) 2 (cid:2) + Gk+1Pzk+1=i1A(u)Pk|k 0 i1=1 (cid:6)p (cid:6)p (cid:9) (cid:10) + G(cid:2)k+2Pzk+2=i1A(µk+1)Pzk+1=i2A(u)Pk|k −2 i1=1i2=1 0 2 4 6 8 10 12 14 (cid:6)p (cid:6)p Fig.1. Environmentandrobot +···+ ··· (G(cid:2)NPzN=i1 i1=1 iN−k (cid:10)(cid:4) landmarks, denoted L1, L2, and L3, were defined. Their ·A(µN−1)···Pzk+1=iN−kA(u)Pk|k (x,y) regionsareshowninFigure1.Eachcoveredheadings of (−80,−100) degrees. The following control functions were created. The optimal control follows immediately from this theo- rem.Wenotethatwhilethecomplexityoffindingtheoptimal • go [uf uθ]: Applies controls uf and uθ. controlincreases exponentiallywith the numberof stages, it • goAvoid [ufN d kθ]: In the absence of obstacles within grows only linearly in the number of landmarks. d of the front, sets uf = ufN. If an object is detected within d, sets uf =ufN(d−rmin) and uθ =±kθ with V. SIMULATIONRESULTS thesignchosentosteerawayfromtheobstacle.(rmin = To illustrate the proposed representation and the derived distance to obstacle.) oTphteimroalboctoinstrmolodlaewlesd, aas saimdiprleectsdimrivuelatsoyrstewmasobdeeyvienlogptehde. • ifnogllotowWwaallllb[uyfsNetktifngkθufd]=:M−akinft(a2i(nds−dirsmtainnc)e+aθn(cid:13))dshinea(θd(cid:13))- following nonholonomickinematics and uθ = −kθ((d − rmin) + 2θ(cid:13)) where rmin is the (cid:13) measured distance to the closest side wall and θ is x˙ = ufcos(θ), (20) the estimate of the heading with respect to the wall. y˙ = ufsin(θ), (21) If both distance and heading errors are small then sets θ˙ = uθ, (22) • aulfig=nWuaflNl [aknθd]:uSθet=s u0f. =0 and uθ =−kθθ(cid:13)where θ(cid:13)is where uL+uR uL−uR the estimate of the heading with respect to the closest uf = , uθ = . (23) side wall. 2 w • rotateAway [kθ]: Sets uf =0 and uθ =−kθθ(cid:13)where θ(cid:13) Here uf and uθ are the forward and heading velocities, is the estimate of the heading with respect to the rear uL and uR are the left and right wheel velocities, and w is wall. the distance between the wheels. It is equipped with a set The following interrupt functions were also defined. of range sensors. The environment is modeled by a set of polygons. The simulator accepts an MDLe plan specified as • wait [τ]: Fires after τ seconds. a list of atoms and at each time step the current interrupt • sideOpen [side d τ]: Fires if sensor on side indicated function is evaluated. If it has fired the next atom is loaded by side (with 1 indicatingleft, 2 indicatingright, and3 and if not the control function is evaluated to determine uL indicatingeither)readsless than d orif τ secondshave anduR. To modelactuatornoise,independentsamples from passed. anormaldistributionareaddedtouL andtouR.Thesystem • alignedWall [ψ τ]: Fires if the estimated heading with the nearest side wall is less than |ψ| or if τ seconds matrices were have passed. 0 0 0 0 0 0 • rotatedAway [ψ τ]: Fires if the estimated heading with AL2 = 0.43 0 0 AL3 = 0.12 0 0 therearwallislessthan|ψ|orifτ secondshavepassed. 1 0.57 1 1 1 0.88 1 1 • atWall [d τ]: Fires if the front sensor reads less than d 1 0 0 1 1 1 or if τ seconds have passed. AL3 = 0 0 0 AL1 = 0 0 0 2 3 0 1 1 0 0 0 From these functions various atoms were constructed and The optimal controller of Corollary 4.2 was used as from the atoms five plans were defined including the follows.Thestatewasinitializedandathree-stagecontroller identity plan (denoted 1I) which applies a zero control. The run to steer the robot to the desired landmark. At the end remaining four (L2, L3, L3, and L1) were designed to steer of three stages the probability vector was tested and if the 1 1 2 3 the robot in the absence of noise from landmark i to j. As probability of being at the desired landmark was less than an example, plan L3 is 0.95 the process was repeated. 2 In Figures 2,3, and 4 we show the evolution of the state, { (sideOpen [3 6 5]) (followWall [1 20 2 0.4]) the true and observed landmarks, and the selected plan at (atWall [0.3 30]) (goAvoid [1 0.05 1 0.025]) each time step for a sample run with a true initial position (wait [0.75]) (go [0 1.57]) on L1, an initial state of a uniform distribution across the (alignedWall [5 10]) (alignWall [2]) states,andadesiredfinalpositiononL2.Thisrunshowsthe (wait [0.5]) (followWall [1.25 20 2 0.4]) robustness of the approach to both the actuator and sensing (sideOpen [1 6 5]) (followWall [1 20 2 0.4]) noise; despite driving to an unintended location twice and (wait [0.5]) (go [0 1.57]) getting several incorrect readings (including the final one) (rotatedAway [3 0.1 5]) (rotateAway [3]) the controller was successful in achieving the objective. (wait [3.5]) (goAvoid [1 0.4 1 0.025]) (wait [2]) (go [0 1.57]) 1 (alignedWall [1 10]) (alignWall [2])} 0.9 3 0.8 wherethenotationis (interrupt)(control).Thisplanreadsas follows.Followthenearestwalluntileithersidereadsgreater 0.7 tchhoaaulnfnatseisxr-eccmlooenctdek.rwsC,isotehn,etnianlguigoenfsotarllalooingwghitntughnattthilewawawlal,lallaluninsdtirlfeotahlclehoewledf.titTsuifdorner State probability2 000...456 sensor reads greater than six meters. Rotate and align to the 0.3 wall behind,moveforwardforthree and a half seconds (but do not run into any intervening obstacles), and then rotate 0.2 1 counter-clockwise 90o. Finally align to the wall. 0.1 0 It should be noted that the plans were chosen to be 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 time somewhat brittle with respect to the simulated noise. In L3, 2 Fig.2. L1 toL2:Stateevolution for example, the robot attempts to detect the opening to the next room quickly. Due to noise the robot may not have movedfarenoughandtheinterruptwillfiretoosoon,causing VI. CONCLUSIONS the robot to end back on landmark two. While more robust Inthispaperwepresentedanapproachtolandmark-based plans could certainly be designed, some level of uncertainty navigation for mobile robots intended for applications in was desired to show the use of the optimal controller. expansiveorsparseenvironmentsanddesignedtohandlethe noisy sensors and actuators one finds in real-world robotics. The a priori observation probabilities were chosen to be Under this approach the set of landmarks is viewed as a (with the notation Pr(i|j)=Pr(z =i|x=Lj)) controlled Markov chain where the controls are feedback Pr(1|1)=0.5 Pr(1|2)=0.3 Pr(1|3)=0.2 control laws encoded in a motion description language. Pr(2|1)=0.2 Pr(2|2)=0.6 Pr(2|3)=0.1 Global information is thus replaced by local information Pr(3|1)=0.3 Pr(3|2)=0.1 Pr(3|3)=0.7 around each landmark and the connections between those landmarks. The Markov matrices were determined by running each An optimal controller was developed using dynamic pro- plan100timesfromeachlandmark.Actuatornoisewassam- gramming that maximizes the probability of steering the pled from a N(0,0.01) distribution. The resulting Markov robot to a desired landmark in N steps. This controller was 3 This research was supported by NSF Grant EIA 0088081 Actual Observed and by AFOSR Grant No. F496200110415.The first author was also supported by a grant from the ARCS foundation. VIII. REFERENCES mber [1] R. W. Brockett. Hybrid models for motion control mark nu2 systems. InH.TrentelmanandJ.Willems,editors,Per- Land spectives in Control, pages 29–54. Birkha¨user, Boston, 1993. [2] V. Manikonda, P. S. Krishnaprasad, and J. Hendler. Languages, behaviors, hybrid architectures and motion control.InJ.BaillieulandJ.C.Willems,editors,Mathe- 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 maticalControlTheory,pages199–226.Springer,1998. time [3] D.Hristu,S.Andersson,F.Zhang,P.Sodre,andP.S.Kr- Fig.3. L1 toL2:Trueandobservedlandmarks ishnaprasad. A motion description language for hybrid systemsprogramming. submittedto IEEETransactions L31 on Robotics and Automation, 2003. [4] D.Hristu-VarsakelisandS.Andersson.Directedgraphs L23 and motion description languages for robot navigation and control. In Proc. of the IEEE Conf. on Robotics and Automation, pages 2689–2694,2002. L13 [5] T. J. Prescott. Spatial representation for navigation in Plan animats. Adaptive Behavior, 4(2):85–123,1996. L12 [6] A.Bandera,C.Urdiales,andF.Sandoval. Autonomous global localisation using Markov chains and optimised sonarlandmarks. InProc.oftheIEEE/RSJInt.Confer- 1 enceonIntelligentRobotsandSystems,pages288–293, 2000. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 [7] B. Yamauchi. Mobile robot localization in dynamic time environmentsusingdeadreckoningandevidencegrids. Fig.4. L1 toL2:Executedplans In Proc. of the IEEE Int. Conference on Robotics and Automation, pages 1401–1406,1996. [8] A. Schultz, W. Adams, and B. Yamauchi. Integrating appliedtoasimulatedrobotandatypicalrunpresented.The exploration,localization,navigationandplanningwitha simulationshowstherobustnesstoactuatorandsensornoise commonrepresentation.AutonomousRobots,6(3):293– afforded to the controller by the design of the underlying 308, June 1999. framework.Wenotethatthecontrollerpresentedhereisquite [9] A. Lambert and Th. Fraichard. Landmark-based safe simple one; more effective ones can certainly be designed. path planning for car-like robots. In Proc of the IEEE Thereareseveralareasofongoingwork.Wearecurrently Int. Conference on Robotics and Automation, pages implementing the approach on a physical system in a large 2046–2051,2000. environment. Since it is not practical to run a plan thou- [10] B. Kuipers. The spatial semantic hierarchy. Artificial sands of times in the physical world, we are developing a Intelligence, 119:191–233,2000. simulator which interfaces to our implementation of MDLe [11] M. Martin and H. Moravec. Robot evidence grids. [3]todeterminetheMarkovmatrices.We arealsoexploring TechnicalReportCMU-RI-TR-96-06,TheRoboticsIn- techniques to identifywhich landmarkthe robotis currently stitute, Carnegie Mellon University, 1996. on,questionsaboutwhenwecanuniquelylocalizeourselves [12] M.EgerstedtandR.Brockett. Feedbackcanreducethe onagivensetoflandmarks(anobservabilityquestionrelated specification complexityof motor programs. submitted to work in [13]), and how to autonomously explore an to IEEE Trans. Robotics and Automation, 2002. unknown environment and develop the Markov-chain based [13] M. Egerstedt and D. Hristu-Varsakelis. Observability representation proposed here. and policy optimization for mobile robots. In Proc. of the 2002 IEEE Conference on Decision and Control, VII. ACKNOWLEDGEMENTS pages 3596–3601,2002. The authors gratefully acknowledge the help of Aaron [14] D.P. Bertsekas. Dynamic Programming and Optimal Greene in developing the code for the simulation. Control, volume 1. Athena Scientific, 1995.