Clemson University College of Engineering and Science Control and Robotics (CRB) Technical Report Number: CU/CRB/3/14/06/#1 Title: Velocity and Structure Estimation of a Moving Object Using a Moving Monocular Camera Authors: V. K. Chitrakaran, D. M. Dawson, J. Chen, and H. Kannan Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE 3. DATES COVERED 2006 2. REPORT TYPE 00-00-2006 to 00-00-2006 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Velocity and Structure Estimation of a Moving Object Using a Moving 5b. GRANT NUMBER Monocular Camera 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION Clemson University,Department of Electrical & Computer REPORT NUMBER Engineering,Clemson,SC,29634-0915 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S) 11. SPONSOR/MONITOR’S REPORT NUMBER(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution unlimited 13. SUPPLEMENTARY NOTES The original document contains color images. 14. ABSTRACT 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18. NUMBER 19a. NAME OF ABSTRACT OF PAGES RESPONSIBLE PERSON a. REPORT b. ABSTRACT c. THIS PAGE 9 unclassified unclassified unclassified Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 Velocity and Structure Estimation of a Moving Object Using a Moving Monocular Camera † V. K. Chitrakaran , D. M. Dawson, J. Chen, and H. Kannan Abstract—In this paper, we present the development of introduced in [4] is then employed to identify the linear a vision-based estimator for simultaneous determination of and angular velocity of the moving object. The estimated velocityandstructureofanobject(i.e.,theEuclideanpositionof velocities facilitate the development of a measurable error itsfeaturepoints)forthegeneralcasewhereboththeobjectand systemthatcanbeusedtoformulateanonlinearleastsquares thecameraaremovingrelativetoaninertialframeofreference. The velocity estimation itself requires no explicit kinematic adaptiveupdatelawforEuclideanstructureestimation.Then, model, while the adaptive structure estimator, synthesized the satisfaction of a persistent excitation condition (similar utilizing Lyapunov design methods, is built upon kinematic to[2]andothers)allowsthedeterminationofthecoordinates relationships that rely on homography-based techniques. forallthefeaturepointsontheobjectrelativetothecamera. Althoughthe requirementof velocitysensors onthe cam- I. INTRODUCTION eramightseemtoorestrictiveatfirst,theproposedalgorithm The application of a camera as a sensor for acquiring doesallow forsomeinterestingapplications.Forexample,a the 3D structure of a scene is known as “Structure from camera on-board an aerial reconnaissance vehicle could be Motion (SfM)” in the computer vision literature [17]. The employed to track and estimate the dimensions of ground problem usually involves a camera mounted on a moving targets (such as movingvehicles) utilizing a single snapshot platform such as a mobile robot whose image data is as the reference image captured, for example, by a satellite. utilized to map the Euclidean position of static landmarks Inthiscase,thecameravelocityinformationcanbeobtained or visual features in the environment. Recent applications from the Inertial Navigation system on-board the aircraft. of this technique include aerial surveillance and mapping Similarly, a pan-tilt camera could be employed as a passive systems such as the work presented in [10] and [12]. A radarforposeandvelocityrecoveryofarigidmovingobject significantextensionto the aboveproblemis the case where in its field of view. The fact that camera motion is allowed both the object of interest and the camera are in motion, frees the movingobject from any restrictive assumptions on with potential impact on such applications as vision-based its trajectory, as the camera orientation can be continuously collision avoidance systems for autonomous guidance of updated in order to keep the object within its field of view. multiplevehiclesonhighways[9].Inthispaper,wepropose A system based on this technique could also be applied in a nonlinear estimation strategy to identify the Euclidean themicroscopicdomain:acameraequippedwithmagnifying structureandvelocityofa movingobject using a monocular lenses mappingstructure and motion of objects too small to calibratedmovingcamera.Theproposedalgorithmrelies on have motion sensors embedded on them. the availability of a reference image of the target object, The remainder of this paper is organized as follows. In captured at a known orientation relative to the camera. For Section II, we begin with the development of a geometric the general case where both the camera and the object are model and homography relating the pixel coordinates of in motion relative to an inertial frame, a single geometric visual markers on the object, extracted from the reference length on the object is assumed to be known, and sensors image and the continuous sequence of images from the mounted on the camera are assumed to provide camera camera.SectionIIIdescribesthemotionkinematicsbetween velocity information. Typically, linearization based methods the camera and the target object. This is followed by the such as extended Kalman filtering [1], [5], [17] provide developmentofestimatorsforrelativevelocityandEuclidean the underlyingalgorithmic foundationfor most SfM results, position in Sections IV and V, respectively. A Lyapunov althoughtheproblemofestimating3Dinformationfrom2D stability analysis for Euclidean position estimation is pro- imagesisinherentlynonlinear.Inthiswork,weapproachthe videdinSectionV-A.InAppendixII,weillustratethecases problemusingnonlinearsystemanalysistools.Equationsfor where either the object or the camera is stationary relative motion kinematics are first developed in terms of Euclid- to the inertial frame. These special cases add to the range ean and image-space information based on the approach of practical applications of this algorithm, especially since presented in [16]. An integral feedback estimation method some of assumptions necessary for the general case can be relaxed. Specifically, no velocity sensors on the camera are This work was supported in part by two DOC Grants, an ARO Auto- required for the special cases. motive Center Grant, a DOE Contract, a Honda Corporation Grant, and a DARPAContract. The authors are with the Dept. of Electrical & Computer Engineering, II. GEOMETRIC MODEL ClemsonUniversity, SC,USA. †Corresponding Author: Phone/Fax: (864) 656-7708/(864) 656-7218, We begin the development of a geometric model relating Email:[email protected] the camera and the object by introducing some notation. C After solving (2) for si and then substituting the resulting d¤ expression into (1), we have i R¤ m¯i =x¯+R¯m¯∗i (3) co;x¤ m¹¤ co whereR¯(t)∈SO(3)andx¯(t)∈R3 arenewrotationaland i O¤ xc m¹i translational variables, respectively, defined as follows ; Rc Rco; ¼¤ R¯ =Rco(Rc∗o)T , x¯=xco−R¯x∗co . (4) x co ¹x si Let three of the non-collinear feature points on the object, ¹R; denoted by Oi ∀i = 1, 2, 3, define the plane π in frame O, and π∗ in O∗. As illustrated in Figure 1, n∗ ∈ R3 I ¼ R ;x O denotestheconstantnormaltotheplaneπ∗ expressedinthe o o si coordinates of C, and the constant projections of m¯ ∗i along the unit normal n∗, denoted by d∗ ∈R, are given by i d∗ =n∗Tm¯∗. (5) i i Using (5), it can be easily seen that the relationship in Fig. 1. The geometry of the camera frame and the object frame relative totheinertial frame. equation (3) can now be expressed as follows (cid:3) (cid:4) 1 In Figure 1, the inertial, camera and the object frames are m¯i = (cid:5)R¯+ d(cid:6)∗i(cid:7)x¯n∗T (cid:8)m¯∗i (6) denoted by the orthogonal coordinate frames I, C, and O, H respectively. For the sake of simplicity, it is assumed that the origin of C coincides with the optical center of the where H(t) ∈ R3×3denotes a Euclidean homography [14]. camera. The vector xc(t) ∈ R3 and the matrix Rc(t) ∈ Toexpresstheaboverelationshipintermsofthemeasurable SO(3) denote the position and orientation, respectively, of image space coordinates of the visual features relative to the camera relative to the inertial frame I, and expressedin the camera frame, the normalized Euclidean coordinates the coordinates of I, such that Rc(t) defines the mapping mi(t),m∗i ∈R3 are defined as follows oxRbocj(et:c)tC∈fr→aRm3Iea.rneSdliamRtiivolea(rtlt)yo,∈tthhSeeOipn(oe3rs)ti,itairolensfrpaaenmcdteivoaerrlieye,nqtwuathaineortniefioeRfdot(bhtye) mi (cid:1) m¯m¯iiz, m∗i (cid:1) mm¯¯∗i∗iz. (7) is the mapping Ro : O → I. We also define the quantities The image coordinates of these feature points expressed xco(t) ∈ R3 and Rco(t) ∈ SO(3) to be the position and relative to C are denoted by pi(t),p∗i ∈R3 as follows (cid:1) (cid:2) (cid:1) (cid:2) oanridenetxaptiroenssoedf tihnethoebcjeacmtefrraamfreamreelaCti,vseuctohtthhaetcRamcoer:aOfr→amCe,. pi = ui vi 1 T , p∗i = u∗i vi∗ 1 T . (8) The three dimensionalstructureof a rigid object in the field The image coordinates and the normalized Euclidean coor- ofviewofthecameraisdescribedintermsofthe3Dlocation dinates are related by the pin-hole camera model [15] such ofvariousvisualfeaturesontheobjectrelativetothecamera that frame. If the Euclidean coordinates of the ith feature point pi =Ami, p∗i =Am∗i (9) oObijoecnttfhreamoebjOec,taisnddem¯noi(tet)d(cid:1)by(cid:1)thm¯eicxonsm¯tainyt smi¯∈izR(cid:2)3Ti∈n tRh3e where A ∈ R3×3 is a known, constant, upper triangular and invertible intrinsic camera calibration matrix. From (6) in the camera frame C, then, it can be seen from Figure 1 and (9), the relationship between image coordinates of the that, correspondingfeature points in O and O∗ can be expressed m¯i =xco+Rcosi. (1) as follows howWethneoiwmadgeevecloooprdaingaetoesmoeftriacfreealtautrioenpsohiinpttohnatthdeesocbrijbeecst pi = (cid:5)mm¯¯(cid:6)(cid:7)∗iizz(cid:8) (cid:5)A(cid:9)R¯+x¯h(cid:6)i((cid:7)n∗)T(cid:10)A−(cid:8)1p∗i (10) change with relative motion between the camera and the αi G object. To this end, we define a reference position of the object relative to the camera, denoted by O∗ in Figure 1. where αi ∈ R denotes the depth ratio, and x¯hi(t) = x¯(t) Atthisreferencelocationrelativetothecamera,theposition ∈R3 denotes the scaled translation vector. The matrix and orientation of O∗, and the Euclidean coordinates of the d∗i G(t) ∈ R3×3 defined in (10) is a full rank homogeneous feature points relative to the camera frame, are denoted by the constant terms x∗ ∈ R3, R∗ ∈ SO(3) and m¯∗ ∈ R3, collineation matrix defined up to a scale factor [15]. If the co co i structure of the object is planar, all feature points lie on the respectively. Hence, similar to (1), we have same plane, and hence the distances d∗ defined in (5) is the i m¯∗i =x∗co+Rc∗osi. (2) same for all feature points, henceforth, denoted as d∗. In this case, the collineation G(t) is defined up to the same details) sucnailtey wfaicthtooru,talnodsshoefncgee,neornaelitoyf.Gitisveenleamreenftesrecnacnebimeasgeet otof e˙v = m¯α∗1 Ae1[S(xco)ωcc−Rcovr+RcoS(s1)ωr] (14) the object correspondingto O∗, and a continuous stream of 1z images of the object from the camera corresponding to O where S(·) ∈ R3×3 denotes the skew-symmetric form of at every instant of time, G(t) can be estimated from a set the vector s1 as defined in [19], and vr(t),ωr(t) ∈ R3 are of linear equations(10) obtainedfrom at least four matched therelativetranslationalandrotationalvelocitiesbetweenthe featurepoints (pi(t),p∗i) that are coplanarbut non-collinear. camera and the object defined in the following manner Imfeththeosdtrduecstucrreiboefdthineo[1b5je]cctoisulndotbpelauntailri,ztehde,VwihrteuraelPtharreaellaoxf vr =RoT (vc−vo), ωr =RoT (ωc−ωo) . (15) the non-collinear feature points on the object are utilized to In (14), Aei(t)∈R3×3 is a function of the camera intrinsic define a virtual plane at the distance d∗ from the camera.. calibration parameters and image coordinates of the ith An overview of the determinationof the collineation matrix feature point as shown below G(t) andthe depthratios αi(t) forboththe planarandnon- ⎡ ⎤ planar cases are also given in [3]. Based on the fact that 0 0 ui the intrinsic camera calibration A is known apriori, we can Aei (cid:1)A−⎣ 0 0 vi ⎦. (16) 0 0 0 thendeterminetheEuclideanhomographyH(t).Byutilizing various techniques (see algorithms in [8], [14], [21]), H(t) Similarly, to quantify the rotation between O∗ and O, we canbedecomposedintoits constituentrotationmatrix R¯(t), define eω(t) ∈ R3 using the axis-angle representation [19] x¯(t) uannditnthoermdaelpvthectroartino∗α,sic(ta)le.dIttraisnsalasstiuomnevdectthoartx¯thh(et)c(cid:1)onstda∗nt as follows eω (cid:1)µφ (17) rotation matrix Rc∗o is known. Rco(t) can therefore be whereµ(t)∈R3representsaunitrotationaxis,andφ(t)∈R computedfrom(4).HenceRco(t), R¯(t),x¯h(t)andαi(t) are denotes the rotation angle about µ(t) confined to the region known signals that can be used in the subsequent analysis. −π <φ(t)<π, and defined as follows Remark 1: The subsequent developmentrequires that the (cid:3) (cid:4) constant rotation matrix Rc∗o be known. We consider this to φ=cos−1 1(cid:9)tr(R¯)−1(cid:10) , S(µ)= R¯−R¯T . beamildassumptionsincethereferenceimageoftheobject 2 2sin(φ) canbeacquiredofflineafterplacingtheobjectorthecamera (18) at some known orientation relative to each other. In(18),thenotationtr(·)denotesthetraceofamatrix.After taking the time derivative of (17), the following expression III. KINEMATICS for rotational kinematics is obtained (see Appendix I for details) Let vc(t),ωc(t) ∈ R3 denote the translational and rota- e˙ω =−LωRcoωr (19) tionalvelocitiesofthecamera,relativetoI andexpressedin thecoordinatesofI.InthecoordinatesofC,thesamephysi- where the Jacobian-like term Lω(t)∈R3×3 is given by the calquantitiesrelativetoI aredenotedasvcc(t),ωcc(t)∈R3. following expression Similarly, vo(t),ωo(t)∈ R3 and voo(t),ωoo(t)∈R3 denote ⎛ ⎞ tahllerteralantsivlaetiotonaIl,anexdprroetsasteiodnainlvtheleocciotioersdoinfattheesoobfjeIctafnrdamOe,, Lω =I3− φ2S(µ)+⎜⎜⎝1− sinc(cid:3)(φφ)(cid:4)⎟⎟⎠S(µ)2 (20) respectively.Thesevelocitiesarerelatedtoeachotherin the sinc2 2 following manner, (cid:1) (cid:2) (cid:1) (cid:2) sin(φ) vT ωT T = RT vT ωT T , (11) and sinc(φ)(cid:1) . From (14) and (19), the kinematics (cid:1) cc cc (cid:2) c (cid:1) c c (cid:2) φ vT ωT T = RT vT ωT T . (12) of relative motion between the camera and the object can oo oo o o o thus be expressed as follows To quantify the translation between the coordinate frames O∗ and O, we define ev(t) ∈ R3 in terms of the image (cid:1) e˙ =Jv(cid:2)+ f (21) coordinates of one of the feature points on the plane π. For where e(t) (cid:1) eT eT T ∈ R6, and v(t) (cid:1) (cid:1) (cid:2) v ω notational simplicity, we chose O1 and hence, vT ωT T ∈ R6. The vector f(t) ∈ R6 and the matrix r r ev (cid:1)(cid:1) u1−u∗1 v1−v1∗ −ln(α1) (cid:2)T . (13) J(t)∈R6×6(cid:21)in (21) are defined as follows (cid:22) Thesignalev(t)ismeasurablesincethefirsttwoelementsof J = −m¯α∗11zAe1Rco m¯α∗11zAe1RcoS(s1) , (22) thevectorareobtainedfromtheimagesandthelastelement 03×3 −LωRco (cid:21) (cid:22) issecatvioainl.aAblfetefrrotamkiknngotwhentsimigenadlesriavsatdivisecuosfs(e1d3)i,nththeefoplrleovwioinugs f = m¯α∗11zAe1S(xco)ωcc (23) translationalkinematicscanbeobtained(seeAppendixIfor 03 where 03×3 ∈ R3×3 denotes a 3×3 zero matrix, and 03 ∈ on e˙(t),e¨(t) an.d.e.. (t) are satisfied. Hence, based on the R3denotes a zero vector. analysisin[4], eˆ(t)→e˙(t) as t→∞.Since J(t)is known Weassumethatasinglegeometriclengths1 ∈R3between and invertible, the six degree-of-freedom relative velocity two feature points onthe objectis known.This allows us to between the object and the camera can be identified as compute x∗co using the following alternative expression follows (cid:25). (cid:26) x∗co =(cid:1)diag(γ¯1−γ¯2)+R¯(cid:2)−1[diag(γ¯1−γ¯2)Rc∗o−Rco]s1 vˆ(t)=J−1(t) eˆ(t)−f(t) (29) (24) and vˆ(t)→v(t) as t→∞. where Remark 3: The definition of relative velocity in (15) can γ¯1 = [diag(m∗1)]−1n∗Tm∗1x¯h, (25) be expressed in the following equivalent form γ¯2 = [diag(m∗1)]−1 m1 (26) vr =RcTovcc−voo, ωr =RcToωcc−ωoo . (30) α1 nabnaoldtsh.mbAe1l(sctoo),m,smpinu∗1ct,eendm∗.,F∗1x¯roh=m(tm)¯(14∗1az)n,(d(x5∗cα)o,1a+(ntdR)tc∗ahorese1da)e,lfilm¯nmi∗1tezioaasnunfrdoarbm¯xl¯e∗1hsc(itag)n-, aS(cid:1)ninveccTscetimwωaecTtcemf(cid:2)aoTdreitshtekhneoobwajesncs,ut(mv2ep8lt)oiocainntydth((cid:1)3a0tv)oTtoahleloωwcoaTsomue(cid:2)srTat.ovreelcoocvietyr we can computexco(t). Also, as mentionedpreviously,it is assumed that the velocities vcc(t),ωcc(t) are available from V. EUCLIDEAN STRUCTURE ESTIMATION sensorsonthecamera.Hence,eachelementofJ(t)andf(t) TofacilitatethedevelopmentofanestimatorforEuclidean given in (22) and (23) are known. coordinatesofthefeaturepointsontheobject(i.e.,thevector Remark 2: Byexploitingthefactthatµ(t)isaunitvector si relative to the object frame O, m¯i(t) and m¯∗i relative to (i.e.,(cid:4)µ(cid:4)2 =1),thedeterminantof Lω(t) canbederivedas the camera frame C for all i feature points on the object), follows [16] we first define the extended image coordinates pei(t) ∈ R3 1 detLω = (cid:3) (cid:4). (27) as θ sinc2 2 pei (cid:1)(cid:1) ui vi −ln(αi) (cid:2)T (31) From(27),itisclearthatLω(t)isonlysingularformultiples From the development in the previous section for transla- of 2π (i.e., out of the assumed workspace). It can also tional kinematics, the following expression for time deriva- been seen that det(Ae1Rco) (cid:5)= 0. Hence, the matrix J(t) tive of the (31) can be obtained is invertible [11]. IV. RELATIVEVELOCITY ESTIMATION p˙ei = −m¯α∗iizAeiRcovr+ m¯α∗iizAeiS(xco)ωcc In [20], a model-free estimator for asymptotic identifica- +m¯α∗i AeiRcoS(si)ωr tion of a velocity signal was presented, utilizing only the iz measuredpositionsignalforestimation.Basedonthiswork, = W1iVvwθi+W2i[θi]1 (32) in [4], we presented a detailed analysis of the application where W1i(·) ∈ R3×3,W2i(·) ∈ R3,Vvw(t) ∈ R3×4, and of this observer for estimation of the velocity of a moving θi ∈R4 are as follows object in the field of view of a fixed camera. Specifically, designatingeˆ(t)∈R6astheestimateforthekinematicsignal W1i = −αiAeiRco (33) e(t), the observer was designed as follows W2i = α(cid:1)iAeiS(xco)ω(cid:2)cc (34) (cid:23) (cid:23) Vvw = vr S(ωr) (35) . t t (cid:27) (cid:28) eˆ (cid:1) t0(K+I6×6)e˜(τ)dτ + t0 ρsgn(e˜(τ))dτ(28) θi = m¯1∗ m¯sTi∗ T . (36) +(K+I6×6)e˜(t) iz iz where e˜(t)(cid:1)e(t)−eˆ(t)∈R6 is theestimationerrorsignal, In (32), the notation [θi]1 denotes the first element in the K,ρ ∈ R6×6are positive definite constant diagonal gain vectorθi. Notethat in(32),we havelinearlyparameterized the time derivative of the extended image coordinates in matrices, I6 ∈ R6×6 is the 6×6 identity matrix, t0 is the terms of known or measurable quantities W1i(·) and W2i(·) initialtime,andsgn(e˜(t))denotesthestandardsignumfunc- and the unknowns Vvw(t) and θi. An estimate for Vvw(t), tion applied to each element of the vector e˜(t). The above denotedby Vˆvw(t), is available fromre-arrangingthe vector estimator is guaranteed to...asymptotically identify the signal vˆ(t) computed from (29). Our objective in this section is ρe˙(ts)atipsrfioevsidtehde ei˙n(et)q,ue¨a(lti)ty, eρi(≥t)(cid:24)(cid:24)∈.e.i(cid:24)(cid:24)L+∞(cid:24)(cid:24),.e..ain(cid:24)(cid:24)d,∀tihe=g1a,in2,m...a6t.riIxt twohdicehvewloilplaanlloewstiumsattoorcofomrptuhteeuthneknEouwclnidceoannstcaonotrθdiiniante(s3o6f) is assumed that the relative velocity, acceleration and jerk theith featurepointontheobject.Tofacilitatethisobjective, between the moving object and the camera are bounded, the parameter estimation error θ˜i(t) is defined as follows i.e., v(t),v˙(t),v¨(t) ∈ L∞. Given these assumptions, the structureof (22)and(23)allows us to show that the bounds θ˜i (cid:1)θi−θˆi (37) where θˆi(t) ∈ R4 is a subsequently designed parameter After taking the time derivative of (49), the following ex- updatesignal.Motivatedbythesubsequentstabilityanalysis, pression can be obtained uwne-minetraosduuracbeleafimletearsusrigabnlaelfiηlit(etr)s∈igRna3ldζei(fitn)e∈dRas3×fo4l,loawndsan V˙ = −21(cid:29)(cid:29)(cid:29)ζiθ˜i(cid:29)(cid:29)(cid:29)2−θ˜iTζiTηi−βi(cid:4)ηi(cid:4)2 ζ˙i = −βiζi+W3i (38) +ηiT(cid:29)W1iV˜(cid:29)vwθi η˙i = −βiηi+W1iV˜vwθi (39) ≤ −12(cid:29)(cid:29)ζiθ˜i(cid:29)(cid:29)2−βi(cid:29)(cid:4)ηi(cid:4)(cid:29)2 Vwvhwe(ret)β−iVˆ∈vw(Rt)∈isRa3×sc4ailsaranpoessittiimveatigoanine,rraonrdsigV˜nvawl(,ta)nd(cid:1) +(cid:4)(cid:29)θi(cid:4)(cid:4)(cid:29)W1i(cid:4)∞(cid:29)(cid:29)V˜vw(cid:29)(cid:29)∞(cid:4)ηi(cid:4) W3i =W1iVˆvw+(cid:1) W2i 03×3 (cid:2). (40) +(cid:29)(cid:29)ζiθ˜i(cid:29)(cid:29)(cid:4)ηi(cid:4)−k1i(cid:4)ηi(cid:4)2+k1i(cid:4)ηi(cid:4)2 The stability analysis presented in the next sub-section +k2i(cid:4)W1i(cid:4)2∞(cid:4)ηi(cid:4)2−k2i(cid:4)W1i(cid:4)2∞(cid:4)ηi(cid:4)2 (50) motivatedthefollowingdesignforestimatesoftheextended where k1i,k2i ∈ R are positive constants as previously image coordinates pei(t), denoted by pˆei(t) ∈ R3, and an mentioned. Further simplification of (50) after utilizing the adaptive least-squares update law [18] for the Euclidean non-linear damping argument [13] results in the following parameters θi, as follows expression . . (cid:3) (cid:4)(cid:29) (cid:29) pˆθˆe.ii == βLiip˜ζeiTip˜+eiζi θˆi +W1iVˆvwθˆi+W2i[θi]1 ((4412)) V˙ ≤ −−(cid:25)β21i−−kk111ii−(cid:29)(cid:29)kζ2iiθ˜(cid:4)iW(cid:29)(cid:29)21i(cid:4)2∞(cid:26)(cid:4)ηi(cid:4)2 where p˜ei(t)(cid:1)pei(t)−pˆei(t)∈R3 denotes the measurable 1 (cid:29)(cid:29) (cid:29)(cid:29)2 estimation error signal. Li(t) ∈ R4×4 is an estimation gain + (cid:4)θi(cid:4)2(cid:29)V˜vw(cid:29) . (51) recursively computed as k2i ∞ d The gains βi,k1i and k2i are selected to ensure that (L−i 1)=ζiTζi (43) 1 1 and initialized such tdhatt L−1(0) > 0 to ensure that it is 2 − k1i ≥ µ1i >0 (52) positive definite for all timie t as required by the stability βi−k1i−k2i(cid:4)W1i(cid:4)2∞ ≥ µ2i >0 (53) analysis. From (32)and(41),the time derivativeof the esti- where µ1i,µ2i ∈ R are positive constants. The above mationerrorinextendedimagecoordinatescanbecomputed gain conditions allow us to further upper-bound the time as follows derivative of (49) in the following manner . . (cid:29) (cid:29) (cid:29) (cid:29) p˜ei=−βip˜ei−ζi θˆi +W1iV˜vwθi+W3iθ˜i. (44) V˙ ≤−µ1i(cid:29)(cid:29)ζiθ˜i(cid:29)(cid:29)2−µ2i(cid:4)ηi(cid:4)2+ 1 (cid:4)θi(cid:4)2(cid:29)(cid:29)V˜vw(cid:29)(cid:29)2 . (54) k2i ∞ The above equation, and (39) allows us to develop an alternate expression for p˜ei(t) as follows Intheanalysisprovidedinthedevelopmentofthekinematic estimator in [4], it was shown that a filter signal r(t) ∈ p˜ei =ζiθ˜i+ηi. (45) R6 defined as r(t) = e˜(t)+ e˜. (t) ∈ L∞ ∩L2.. From this A. Stability Analysis result it is easy to show that the signals e˜(t), e˜ (t) ∈ L∞ Theorem 1: The update law defined in (42) ensures that ∩L2 [6]. Since J.(t) ∈ L∞ and invertible, it follows that θ˜i(t) → 0 as t → ∞ provided that the following persistent v(cid:29)(cid:29)˜(t) = (cid:29)(cid:29)J2−1(t) e˜ (t) ∈ L∞ ∩ L2. Hence it follows that excitation condition [18] holds (cid:29)V˜vw(t)(cid:29) ∈L1 and γ1I4×4 ≤(cid:23) t0+T ζiT(τ)ζi(τ)dτ ≤γ2I4×4 (46) ∞(cid:23) ∞ 1 (cid:4)θi(τ)(cid:4)2(cid:29)(cid:29)(cid:29)V˜vw(τ)(cid:29)(cid:29)(cid:29)2 dτ ≤ε (55) t0 0 k2i ∞ and providedthat the gains βi satisfy the followinginequal- where ε ∈ R is a positive constant. From (49), the integral ities of (54), and (55), it can be concluded that (cid:23) (cid:3) (cid:29) (cid:29) (cid:4) βi > k1i+k2i(cid:4)W1i(cid:4)2∞ (47) ∞ µ1i(cid:29)(cid:29)ζi(τ)θ˜i(τ)(cid:29)(cid:29)2+µ2i(cid:4)ηi(τ)(cid:4)2 dτ k1i > 2 (48) 0 ≤ V(0)−V(∞)+ε. (56) where t0,γ1,γ2,T,k1i,k2i ∈ R are positive constants, I4×4 ∈ R4×4is the 4×4 identity matrix, and the notation Hence, ζi(t)θ˜i(t),ηi(t) ∈ L2.Also, from (56) and (cid:4).(cid:4) denotes the induced ∞-norm of a matrix [18]. the fact that V(t) is non-negative, it can be concluded ∞ Proof:LetV(t)∈Rdenoteanon-negativescalarfunction that V(t) ≤ V(0) + ε ∈ L∞. Therefore, from (49), defined as follows θ˜iT(t)L−i 1(t)θ˜i(t),ηi(t)∈L∞.Basedontheassumptionthat 1 1 the persistent excitation condition in (46) is satisfied, and V (cid:1) 2θ˜iTL−i 1θ˜i+ 2ηiTηi. (49) L−i 1(0) is chosen to be positive definite, we can use (43) to show that L−i 1(t) is always positive definite; hence, it must [7] W. E. Dixon, A. Behal, D. M. Dawson, and S. Nagarkatti, Nonlin- followthatθ˜i(t)∈L∞.SinceW3i(·)in(40)iscomposedof ear Control of Engineering Systems: A Lyapunov-Based Approach, Birkha¨user Boston,2003. boundedterms, andis the inputto the stable filter in (38),it can be shown that ζi(t),ζ˙i(t) ∈ L∞ [6], and consequently, [8] iOn.aFPauiegceerwasiseanPdlaFn.arLEusntvmiraonn,m“eMnto,”tioInntearnndatiSotnraulctJuoreurnFarolmofMPaotttieornn ζain(dt)hθei(ntc)e,∈froLm∞.(4It2)f,olθˆ.liow(ts),fθ˜.riom(t)(4∈5(cid:25))Lt∞ha.tBp˜aesie(d(cid:26)t)o∈n thLe∞se, [9] 1RJ.9e8Mco8.g.nFietriorynmaannd, SA.rJti.fiMciaaylbIanntekl,liagnedncAe,.VDo.l.W2o,rrNalol,.“3V,ispupa.l4S8u5r-v5e0i8l-, arguments, it is easy to see that ddt ζi(t)θ˜i(t) ∈ L∞. Nlaon.ce2,foprpM.1o8v7i-n1g9V7,eh2i0c0le0s.,”Intl.JournalofComputerVision,Vol.37, Therefore, ζi(t)θ˜i(t) is uniformly continuous [7], and since [10] T. Fukao, K. Fujitani, and T. Kanade, “An Autonomous Blimp for we also have that ζi(t)θ˜i(t) ∈ L∞ ∩L2, we can conclude Surveillance System,” Proc. of the 2003 IEEE/RSJ Intl. Conf. on Intelligent RobotsandSystems,pp.1820-1825,October 2003. the following result from Barbalat’s Lemma [7] [11] R. Horn, and C. Johnson, Matrix Analysis, Cambridge University ζi(t)θ˜i(t)→0 as t→∞. (57) [12] PT.reKssa,nIaSdBeN, :O0.5A21m3i0d5i8,6Q1.,1K9e8,5“.Real-Time and 3D Vision for Au- tonomous Small and Micro Air Vehicles,” Proc. of the 2004 IEEE Ifthesignal ζi(t) satisfies thepersistentexcitationcondition Conf.onDecisionandControl,pp.1655-1662,Bahamas,2004. given in (46), then it can be concluded [18] from (57) that [13] M. Krstic´, I. Kanellakopoulos, and P. Kokotovic´, Nonlinear and AdaptiveControlDesign,NewYork,NY:JohnWileyandSons,1995. θ˜i(t)→0 as t→∞. (58) [14] Y.Ma,S.Soatto,J.Kosˇecka´,andS.Sastry,AnInvitationto3DVision, Springer-Verlag, ISBN:0387008934,2003. (cid:2) [15] E. Malis and F. Chaumette, “2 1/2 D Visual Servoing with Respect to Unknown Objects Through a New Estimation Scheme of Camera Remark 4: From (7), (9), (36), (42), and Theorem 1, the Displacement,”InternationalJournalofComputerVision,Vol.37,No. constantEuclideanpositionofallfeaturepointsontheobject 1,2000,pp.79–97. [16] E. Malis, “Contributions a` la mode´lisation et a` la commande en at the reference position, and the time varying Euclidean asservissement visuel”, Ph.D. Dissertation, University of Rennes I, position of the feature points on the moving object relative IRISA,France,Nov.1998. to the camera can be computed as follows [17] J. Oliensis, “A Critique of Structure From Motion Algorithms,” ComputerVisionandImageUnderstanding, Vol.80,No.2,pp.172- 1 mˆ¯∗(t) = (cid:30) (cid:31) A−1p∗ (59) 214,2000. i θˆi(t) i [18] aSn.dSaRsotrbyu,satnnedssM,P.rBenotdicsoenH,aAldl,aIpnticv:eECngolnetwroolo:dStCabliiflfist,y,NCJ,on1v9e8r9g.ence, 1 1 [19] M.W.SpongandM.Vidyasagar,RobotDynamicsandControl,John mˆ¯i(t) = (cid:30) (cid:31) A−1pi(t). (60) WileyandSons,Inc:NewYork,NY,1989. αi(t) θˆi(t) [20] B.Xian,M.S.deQueiroz, D.M.Dawson,andM.L.McIntyre, “A 1 DiscontinuousOutputFeedbackControllerandVelocityObserverfor NonlinearMechanicalSystems,”Automatica,Vol.40,No.4,pp.695- VI. CONCLUSIONS 700,April2004. [21] Z. Zhang and A. R. Hanson, “Scaled Euclidean 3D Reconstruction Thisworkdevelopedatechniquefortherecoveryofstruc- Based on Externally Uncalibrated Cameras,” IEEE Symposium on tureandmotionforthegeneralcasewhereboththeobjectof ComputerVision,pp.37-42,1995. interest as well as the camera are in motion. Homography- basedtechniquesandadaptiveestimationtheoryprovidedthe APPENDIX I basis for the estimator design. Future work will include the DEVELOPMENTOF TRANSLATIONAND ROTATION validationoftheproposedestimatorthroughsimulationsand KINEMATICS experiments. The applicability of this work for tasks such as aerial surveillance of moving targets is well motivated To facilitate the development of the kinematic equations, since no explicit model describing the object or its motion the following properties of rotational matrices and skew is required by the estimator. symmetric matrices are utilized [19] REFERENCES R˙c = RcS(ωcc) (61) [1] MT.oJt.ioBnroEisdtaim, aSt.ioCnhFarnodmrasahMekohnaor,cualnadrIRm.aCgeheSlleaqpupean,c“eR,”eIcEuErsEivTera3n-Ds. R˙c = S(ωc)Rc (62) onAerospaceandElectronic Systems,Vol.26,No.4,1990. R˙o = S(ωo)Ro (63) [2] X. Chen, and H. Kano, “State Observer for a Class of Nonlinear Systems and Its Application to Machine Vision,” IEEETransactions S(u)ζ = −S(ζ)u (64) onAutomaticControl,Vol.49,No.11,2004. S(Ru) = RS(u)RT (65) [3] V. K. Chitrakaran, D. M. Dawson, J. Chen, and W. E. Dixon, “EuclideanPositionEstimationofFeaturesonaMovingObjectUsing a Single Camera: A Lyapunov-Based Approach,” Proc. of the 2005 where R(t) ∈ SO(3) is a rotation matrix, u(t),ζ(t) ∈ R3 AmericanControlConference,pp.4601-4606,June2005. are some arbitrary vectors, and the rest of the terms were [4] V. K. Chitrakaran, D. M. Dawson, W. E. Dixon, and J. Chen, “Identification ofaMoving Object’s Velocity with aFixed Camera,” defined previously. Automatica,Vol.41,No.3,pp.553-562,March2005. Todevelopthetranslationalkinematics,thetimederivative [5] A.Chiuso, P.Favaro, H.Jin, and S. Soatto, “Structure fromMotion of (13) is obtained as follows CausallyIntegratedOverTime,”IEEETransactionsonPatternAnaly- sisandMachine Intelligence, Vol.24,pp.523-535,April2002. [6] PCr.oApe.rtDieess,oAerc,aadnedmiMc.PrVeisdsy,a1s9a7g5a.r, Feedback Systems: Input-Output e˙v = m¯α∗1 Ae1 m¯. 1 (66) iz From Figure 1, the position and orientation of the object 1 relative to the camera are as follows 0.8 sss222xyz xco = RcT (xo−xc) (67) 0.6 Rco = RcTRo. (68) 0.4 0.2 After taking the time derivative of the above equations, and m) utilizing the properties in (61) to (65), we have error ( 0 −0.2 x˙co = S(xco)ωcc+RcT(vo−vc) (69) −0.4 R˙co = RcTS(ωo)Ro−RcTS(ωc)Ro. (70) −0.6 Utilizing(69)and(70),theexpressionforthetimederivative −0.8 of (1) for i=1 is given by the following expression −10 10 20 30 40 50 60 t (sec) . m¯1=S(xco)ωcc−Rcovr +RcoS(s1)ωr. (71) Fig.2. Estimationerrorinthecoordinatesofafeaturepointontheobject After substituting (71) in (66), the translational kinematics (fixedcamera,1%pixelnoiseadded). in (14) can be obtained. The development of rotation kinematics follows the de- scription given in [16], to which the reader is referred to whereLω(t)isdefinedin(20).From(72),(4),(68),andthe fordetails. Only a brief descriptionis givenhere.We define properties in (62) and (63), it can be shown that ω¯(t)∈R3 to be the relative rotational velocity between the frames O∗ and O as observed by the camera. Then it can ω¯ =−Rcoωr (81) be shown that [19] R¯.=S(ω¯)R¯. (72) where ωr(t) ∈ R3 was defined in (15). After multiplying bothsidesof(80)byLω(t),andutilizing(81),therotational The open-loop error system for eω(t) is derived based on kinematics given in (19) can be obtained. the following exponential parameterization [19] R¯ =I3×3+S(µ)sin(φ)+2sin2(φ2)S(µ)2 (73) SPECIAL CASES:AFIXAEPDPECNADMIXERIIA, ORA FIXED OBJECT where I3×3 is the 3×3 identity matrix. Using (73) and its In the previous sections, we presented the development time derivative in (72), it can be shown that for the general case where both the camera and the object S(ω¯)=sin(φ)S(µ˙)+S(µ)φ˙+(1−cos(φ))S(S(µ)µ˙). are in motion relative to an inertial frame. As described in (74) SectionsIVandV,suchatreatmentimposedafewrestrictive To facilitate furtherdevelopment,the time derivativeof (17) assumptions on the system, chief among them being that is determined as follows the persistent excitation condition(cid:1)of (46) was sa(cid:2)tisfied at all times, the camera velocity vT(t) ωT(t) Twas e˙ω(cid:9)=µ˙φ+µφ˙. (cid:10) (75) measurable, a single position vector s1cwc as knocwcn, and the By multiplying (75) by I3×3+S(µ)2 , the following ex- orientation of the object at a reference position relative to pression can be obtained the camera (Rc∗o) was known. The fact that s1 and Rc∗o (cid:9) (cid:10) was available enabled us to compute the depth information I3×3+S(µ)2 e˙ω =µφ˙ (76) m¯∗iz and the position vector xco(t), apart from the rotation where we utilized the following properties matrix Rco(t). As shown below, for the special cases where either the object or the camera are stationary, some of these µTµ = 1, µTµ˙ =0 (77) requirements can be relaxed, increasing their potential for S(µ)2 = µµT −I3×3 (78) application in a wider range of real-world scenarios. These cases are derived in detail in [3], and here we present a Likewise, by multiplying(75)by −S(µ)2 andthen utilizing summary. (77), the following expression is obtained 1) Fixed Camera: When the camera is fixed relative to −S(µ)2e˙ω =µ˙φ. (79) the inertialframe,the kinematicsexpressionin (21)is given by From the expression in (74), the properties given in (64), (75), (76), (79), and the fact that e˙ =Jfcvoo (82) 1 sin2(φ)= (1−cos(2φ)), where Jfc(t)∈R6×6 is the following 2 (cid:21) (cid:22) we cab obtain the following expression Jfc = m¯α∗i1zAe1Rco −m¯α∗i1zAe1RcoS(s1) . (83) ω¯ =L−ω1e˙ω (80) 03×3 LωRco The time derivative of extended image coordinates in (32) where W1i(·) ∈ R3×3,W2i(·) ∈ R3×3, and θi ∈ R are reduces to the expression given below defined as follows p˙ei =W1iVvwθi (84) W1i = −αiAei (89) W2i = AeiS(mi) (90) (cid:1)whveoroe WS(1ωi(o·o)) (cid:2)=∈Rα3×iA4.eAiRllcaossu∈mptiRon3×s3andanredquVirvewmen=ts θi = m¯1∗iz. (91) arethesameasinthegeneralcase,exceptthattheestimation of the object velocity and structure relative to the fixed Ameasurablefiltersignalζi(t)∈R3,similarto(38),isnow defined in the following manner camera requires no additional sensors on the camera (since ωcc(t) = 0), and the computation of the position vector ζ˙i =−βiζi+W1ivˆcc (92) xco(t) is not required. As an example, Figure 2 shows the errors in the estimation of the Euclidean coordinates of one where vˆcc(t) ∈ R3 is the estimate for translational velocity of the camera. In [3], following an approach similar to that of the features points, in the simulation of a planar object presentedinSectionV,itisshownthatthedepthparameters moving in the field of view of a fixed camera. 2) MovingCamera: With theobjectstationaryrelativeto θi can be estimated using the adaptive least-squares update law of (42), and subsequently, the Euclidean position of all the inertial frame, the kinematics of the movingcamera can feature points on the object relative to the camera can be be expressed as follows estimated. Again, the need for additional on-board sensing e˙ =Jmcvcc (85) is eliminated. Also note that as a consequence of the way the filter signal ζi(t) in (92) is formulated, the persistent where Jmc(t)∈R6×6 is the following Jacobian-like matrix excitation of (46) is also simplified, as it now depends only on the translational velocity signal. (cid:21) (cid:22) Jmc = −m¯α∗i1zAe1 Ae1S(m1) . (86) 03×3 −Lω In the above expression, notice that all terms are either known apriori, or directly measurable, except the constant depth m¯∗ . If the camera can be moved away from its iz reference position by a known translation vector x¯k ∈ R3, then m¯∗ can be computedofflinewithout the knowledgeof iz Rc∗o and s1. Decomposition of the Euclidean homography betweenthenormalizedEuclideancoordinatesofthefeature points obtained at the reference position, and at x¯k away fromthereferenceposition,respectively,canyieldthescaled translation vector x¯k ∈R3. Then, it can be seen that1 d∗ d∗ d∗ m¯∗ = = . (87) iz n∗Tm∗ n∗TA−1p∗ 1 1 Hence,fromtheabovediscussion,wenotethatitispossible to develop a velocity estimator for the moving camera case withouthavingtoknowthereferenceorientationmatrixR∗ co and any geometric length s1 on the object. However, as a consequence of the specific approach we have employed in the development of the Euclidean estimator in Section V, we are unable to benefit from this relaxation in the assumptions for velocity estimation. In [3], we formulate the Euclidean estimation in a slightly different manner that allows us to eliminate the requirement of Rc∗o and s1; i.e., the time derivativeof the extendedpixel coordinatesin (32) is expressed as p˙ei =W1ivccθi+W2iωcc (88) 1Note that for any feature point Oi coplanar with π∗, m¯∗iz could be computedthisway.