ebook img

Learning Binary Features Online from Motion Dynamics for Incremental Loop-Closure Detection and Place Recognition PDF

7.1 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Learning Binary Features Online from Motion Dynamics for Incremental Loop-Closure Detection and Place Recognition

Learning Binary Features Online from Motion Dynamics for Incremental Loop-Closure Detection and Place Recognition Guangcong Zhang1, Mason J. Lilly2, and Patricio A. Vela1 Abstract—This paper proposes a simple yet effective ap- Such codewords may not be invariant to the perspective proach to learn visual features online for improving loop- transformation from the robot motion. closuredetectionandplacerecognition,basedonbag-of-words A common fact in mobile robot applications, arising from frameworks. The approach learns a codeword in bag-of-words the design of appearance-based loop-closure systems, is that model from a pair of matched features from two consecutive 6 frames, such that the codeword has temporally-derived per- the robot often requires similar motion to trigger a loop- 1 spective invariance to camera motion. The learning algorithm closure. The loop-closing image sequence is captured under 0 is efficient: the binary descriptor is generated from the mean similar perspective transformations. The key idea in this 2 image patch, and the mask is learned based on discriminative paper is to learn the codewords by learning feature descrip- n pleraorjneecdtiofneabtuyremainnidmtihziengtwtoheoriingtirnaa-lclfaesastudriesst.anAcecsodaemwoonrgdtfhoer tors invariant to the perspective transformations induced by a bag-of-words models is generated by packaging the learned robotmotion.Withsuchcodewords,visualfeaturesfromthe J descriptorandmask,withamaskedHammingdistancedefined same object subjected to perspective distortions are more 5 tomeasurethedistancebetweentwocodewords.Thegeometric likely to trigger loop-closure hypotheses and improve the 2 properties of the learned codewords are then mathemati- recall. We use binary features due to its overall advantages cally justified. In addition, hypothesis constraints are imposed demonstrated in loop-closure applications (efficient compu- ] through temporal consistency in matched codewords, which V improvesprecision.Theapproach,integratedinanincremental tation with high precision-recall (PR)) [11], [12]. Learning C bag-of-words system, is validated on multiple benchmark data binary features is be done by treating the image patches . sets and compared to state-of-the-art methods. Experiments from a matched pair together with the mean patch as a s demonstrate improved precision/recall outperforming state of single class, then optimizing the binary test by minimiz- c the art with little loss in runtime. [ ing the intra-class distance and maximizing the inter-class distance through Linear Discriminants Analysis (LDA) [2], 2 I. INTRODUCTION v [6], [5]. Furthermore, because a codeword is learned from Long-distancevisualSimultaneousLocalizationandMap- 1 two consecutive images, its consistent nature implies that 2 ping experiences drift that require sophisticated approaches if an frame is retrieved as a loop-closure hypothesis, its 8 to handle in both uncooperative [27], [28] and cooperative previous or next frame should also be a hypothesis. Based 3 cases [21], [22], [23]. When the robot revisits a place, on this, our algorithm imposes temporal constraints in the 0 the drift can be greatly reduced by imposing geometric hypothesis selection, which further improves the precision. . 1 constraints in the posterior optimization (e.g. bundle ad- Inaddition,mathematicalanalysisshowsthatthecodewords 0 justment) [16]. Identifying when a robot has returned to a learnedbyourmethodhavenicegeometricproperties,which 6 previouslyvisitedplaceisreferredasloop-closuredetection. 1 theoretically supports the proposed method. It plays a key role in visual SLAM systems. : The major contributions of our paper include: v Since [8] appearance-based methods have become preva- i • an efficient algorithm based on LDA for learning code- X lent in visual loop closure detection, due to their decoupling words invariant to perspective transformations from from the location estimate of the robot. A key research in r robotmotion,involvingonlymatrixadditionsonimage a appearance-based methods is what kind of visual features patch and bit-wise operations on binary vectors; best describe the scene. Along with the progress of feature • theoretical justification for the geometric properties in descriptor designs, various features have been used in loop- the learned codeword, demonstrating that the learned closure detection, from the traditional floating point arith- codewords can be viewed as “centroids” in the space metic based features such as SIFT [15], SURF [3], to the induced by the modified Hamming distance; recent binary-encoded features [7], [14], [18]. In the typical • integration into the incremental bag-of-word loop- bag-of-words loop-closure system, a feature is extracted closure detection system with additional simple hy- from a single frame after it is matched with the previous pothesisconstraintsthatdemonstrateimprovedPRwith frame,thenpotentiallyusedasacodewordinthevocabulary. trivial runtime loss on various benchmark data sets. 1GuangcongZhangandPatricioA.VelaarewithSchoolofElectrical& II. RELATEDWORK ComputerEngineering,andInstituteofRoboticsandIntelligentMachines, Georgia Institute of Technology, North Ave NW, Atlanta, GA 30332, Research effort has sought to improve the pipeline of USA.MasonLillyiswithTransylvaniaUniversity,Lexington,KY40508, appearance-based methods, involving: (1) the visual features USA.MasonLilly’sworkwasdonewhenhewasinGeorgiaTechduring chosen as codewords; (2) loop-closure retrieval models, e.g. asummerresearchprogram.{zhanggc, pvela}@gatech.edu, [email protected]. probabilistic model in Fab-Map [8], bag of visual words model [1], [11]; (3) data structures for vocabulary storage A. Binary Descriptors and Intra-class Distance andsearch,e.g.Chow-Liutree[8],vocabularytreeforbinary 1) Binary Descriptors for a visual feature: features [11]; and (4) online incremental design in order to Binary descriptors [7], [14], [18] follow the same basic get rid of offline training, e.g. IBuILD system [12]. This formulation. Given an image intensity patch I, a binary reviewfocusesonthevisualfeaturesusedintheloopclosure descriptor is encoded as a binary vector x, composed of L detection, which is highly driven by the development of bits x ∈ B (typically L = 512). Often after a smooth- featuredescriptors.Earlyworkinvisualdescriptorsbasedon i ing operation, each bit in x is generated by binary tests floating-point arithmetic, such as SIFT [15] and SURF [3], {[a ,b ]}L are typically too expensive in computation to fulfill real- i i i=1 (cid:26) timeloop-closuredetection.ThemilestoneworkFab-map[8] 1 if I(a )<I(b ) x = i i , ∀i=1...L (1) mitigatesthisproblembyusingquantizedSURFdescriptors, i 0 otherwise whichareencodedinbinaryvectors.Suchanapproximation (cid:2) (cid:3)(cid:62) trades precision-recall with runtime. In [1], the authors use where each ai = uai,vai (and similarly bi) is a pixel rawSIFTdescriptorswithanadditionalfeaturespaceoflocal position.Thebinarytestspatternareusuallygeneratedoffline color histogram. With a tree-structure vocabulary in their bytrainingonlargedatasets.Here,weusetheBOLDbinary bag-of-word model, high frame rates are attained with the test pattern [2]. featureretrievaloflogarithmic-timecomplexityincodeword 2) Intra-class distance: number. The work in [13] attains real-time performance by The distance between two binary descriptors is measured by usingcompactrandomizedtreesignatures.Withthedevelop- the Hamming distance dH with bit-xor operation ⊕: mentsofbinarydescriptorssuchasBRIEF[7],BRISK[14], (cid:16) (cid:17) d x(k),x(k(cid:48)) =x(k)⊕x(k(cid:48)) ORB [18], binary features have been widely adopted for H loop-closure detection (often with bag-of-word models) due L L (2) to their fast computation and comparable precision/recall =(cid:88)x(k)⊕x(k(cid:48)) =(cid:88)(cid:16)x(k)−x(k(cid:48))(cid:17)2 l l l l to SIFT and SURF. The new standard, a binary features i=1 i=1 basedbag-of-wordsystem,ispresentedin[11].TheIBuILD Forasetofimagepatches{I(k)}fromthesameclassand system [12] further improves the binary bag-of-word recipe the corresponding binary descriptors {x(k)}, the expected by designing an online incremental system without the need intra-class distance is: forpriorfeaturetraining.OursystemimprovesuponIBuILD. Beside standard feature descriptors, related research has L sought to improve descriptors through additional learning E(cid:104)d ({x(k)})(cid:105)= 1 (cid:88)E(cid:104)d ({x(k)})(cid:105) processes. These learning approaches include LDA [6], H L H l l=1 [5], [25], [2], boosting [24], Principal Component Analysis L K K (PCA) [26], Domain-Size Pooling [10], etc. Our work is = 1 (cid:88)(cid:88) (cid:88) d (x(k),x(k(cid:48))) inspired by [2], in which LDA is applied by learning a LK2 H l l l=1k=1k(cid:48)=1 mask locally to minimize the intra-class distance of binary L K K descriptors. The algorithm synthesizes samples from each = 1 (cid:88)(cid:88) (cid:88) (cid:16)x(k)−x(k(cid:48))(cid:17)2 LK2 l l image by rotating the patch, and treats the original patch l=1k=1k(cid:48)=1 as a pivot patch along with the axillary samples to form a L (cid:32) K K K K (cid:33) single class for LDA. To contrast, our method first uses the = 1 (cid:88) 2(cid:88)(cid:88)(cid:16)x(k)(cid:17)2−2(cid:88) (cid:88) x(k)x(k(cid:48)) LK2 l l l synthesized patch as the pivot patch and the original patches l=1 k=1k=1 k=1k(cid:48)=1 athseaiunxvialrliaarnycepatotchsoems.eMhoeurerisitmicproorttaatnitolny,triannstsefaodrmoaftiloenasr,noinugr = L1 (cid:88)L 2E(cid:2)x2l(cid:3)−2E[xl]2 (3) method learns the transformation invariance for the actual l=1 robot motion. B. Learning Codewords from Motion Dynamics 1) Minimizing intra-class distances with binary masks: Eq. 3 shows that minimizing the intra-class distance can be III. LEARNINGBINARYCODEWORDINVARIANTTO done by masking the binary coordinates with high variance, FRAME-BY-FRAMEMOTIONDYNAMICS effectively projecting out the highly variable coordinates. We will therefore package learned codewords into a feature This section first presents basics about binary feature de- ensembleconsistingofafeaturedescriptorandabinarymask scriptors and the LDA method for optimizing the descriptor D={x,y}∈DMH, where the mask y is defined bfoyrlleeaarrnniinnggafematausrkes. Tfrhoemn mitodteiotanildsytnhaempircosp.oTsheedgaelgoomrietthrmic yi ∼=(cid:26) 10 iofth(e∧rkwxis(ike) =1)∨(∧kx(ik) =0) (4) properties will be discussed in detail. Out notation uses boldface for a vector or matrix (e.g., x), and normal font for each coordinate i∈{1...L}. The distance between two for a scalar binary or real value (e.g., x ). feature ensembles, with non-zero masks, is defined to be i Algorithm 1: Learningcodewordsfrommotiondynamics. Data: I ,I ∈RM×N – imaged patches from a pair of 1 2 matched features in two consecutive frames; {[a ,b ]} – lists of binary tests positions i i Result: D ={x ,y } – Codeword of length L m m m invariant to perspective transformation between I ,I 1 2 Fig. 1. For each pair of matched frame, we extract a 48×48 patch I1 1 Im ← 21(I1+I2) ; // O(MN) fromthepreviousframe,andI2 fromthecurrentframe.Thesetwowillbe 2 xm ← BinaryTests (Im,{[ai,bi]}) ; // Eq.1,O(L) usedtolearnacodeword. 3 ym ← MaskLearning ({Im,I1,I2}) ; // Eq.7,O(L) “masked Hamming distance” d : MH |y ||x ⊕x ∩y |+|y ||x ⊕x ∩y | d (D ,D )= 2 1 2 1 1 1 2 2 MH 1 2 |y |+|y | 1 2 |y | |y | = 2 d1+ 1 d2 (5) |y |+|y | 2 |y |+|y | 1 1 2 1 2 where |·| is the number of 1s in a binary vector; di (cid:44) j |x ⊕x ∩y | and di (cid:54)=dj. i j i j i The distance metric defined in Eq. 5 has two significant properties: (1) d (D ,D ) ∈ [0,L]. The lower bound is MH 1 2 straightforward, and the upper bound is simply given by d (D ,D ) ≤ 2|y1||y2| ≤ √2|y1||y2| = (cid:112)|y ||y | ≤ MH 1 2 |y1|+|y2| 2 |y1||y2| 1 2 L. (2) When y ,y are all 1s, then d (D ,D ) ≡ 1 2 MH 1 2 d (x ,x ),whichmeansthemaskedHammingdistancede- H 1 2 finedalsoaccommodatestheHammingdistance.Themasked Hamming distance is not a metric since the coincidence axiom and the triangular inequality do not hold. 2) Learning codewords invariant to cross-frame motion: Given a pair of matched image patches I ,I from two 1 2 consecutive frames (as depicted in Fig. 1), the mean patch Fig.2. IllustrationofbinarytestsandmasklearningforIm fromI1,I2. I is used to generate the codeword D ={x ,y }, m m m m 1 I = (I +I ). (6) m 2 1 2 I has the structural information of I ,I . Binary tests on m 1 2 I , I , and I , per Eq. 1, generate the binary vectors x , 1 2 m 1 x , and x . The masks y and y are set to all 1s. The 2 m 1 2 maskym iscomputedbyminimizingtheintra-classdistance Fig.3. Codewordlearningfromtheperspectiveofbit-wiseoperations. among {I ,I ,I }: m 1 2  (cid:84) 1 if (I (a )<I (b ))=1 y = or (cid:84)k∈{1,2,m} (kI (ia )<kI (bi ))=1 (7) DMH induced by dMH (note that ∀ Dk ∈DMH, |yk|=(cid:54) 0): m,i k∈{1,2,m} k i k i 1) D can be viewed as the topological centroid of D  m 1 0 otherwise and D ; 2 The algorithm definined the mean codeword is summa- 2) Given any other point Dk ∈DMH, Dm preserves the rized in Algorithm 1. Fig. 2 depicts two binary tests on localities between Dk and Dm, and between Dk and {Im,I1,I2}. If non-zero variance exists in the same test D1,D2. across three patches, the corresponding dimension in xm These properties are shown to hold in Theorems 1 and 2. will be masked out. Fig. 3 illustrates the process in terms Theorem 1: LetD bethecodewordgeneratedfromD m 1 of binary vector expressions. and D , then 2 C. Geometric Properties of Learned Codewords dMH(Dm,D1)≤dMH(D1,D2) and (8) d (D ,D )≤d (D ,D ) Relative to the source codewords D = {x ,y } and MH m 2 MH 1 2 1 1 1 D = {x ,y } both in D , the learned mean codewords Proof: Here, we prove the first inequality. From Eq. 7, 2 2 2 MH havethefollowingtwonicegeometricpropertiesinthespace dm = 0. Also, because I is the mean patch of I and 1 m 1 Case 1: If x and x have the same value, WLOG 1 2 assume this value is 1, then x = y = 1. Since m m x ,y ∈ {0,1}, there are four situations as listed in k k Table I. Case 2: If x and x have the different value, WLOG 1 2 assume x = 1,x = 0, then y = 0 and x can be 1 2 m m 1 or 0. Due to the symmetry, we only need to consider Fig.4. TopologicallyDm iscentroidofD1 andD2 inDMH. either one of these situations. Let x = 0. There are m again four situations listed in Table II. TABLEI In sum, in any one-dimensional case it always holds that DISTANCESOFCODEWORDSWITHONEBIT(CASE#1).FORSIMPLICITY WEUSEdMH(i,j)TODENOTEdMH(Di,Dj). dMH(m,k)≤dMH(1,k)+dMH(2,k) (12) ⇐⇒dm+dk ≤d1 +dk+d2 +dk D1 D2 Dm Dk dMH(1,k) dMH(2,k) dMH(m,k) k m k 1 k 2 (1,1) (1,1) (1,1) (1,1) 0 0 0 ForanydescriptorwithLdimensions,dm+dk ≤d1+dk+ (1,1) (1,1) (1,1) (1,0) 0 0 0 k m k 1 (1,1) (1,1) (1,1) (0,1) 1 1 1 d2 +dk still holds, because di is simply a summation over k 2 j (1,1) (1,1) (1,1) (0,0) 1 1 1 all dimensions without weighting. Also considering |y |≤|y |≡|y |=L, we have k 1 2 d (D ,D )+d (D ,D ) (13) I , |x ⊕x | ≤ |x ⊕x |, which further implies d1 = MH 1 k MH 2 k 2 m 1 2 1 m |(xm⊕x1)∩y1| ≤ |(x2⊕x1)∩y1| = d12. In addition, = |yk|d1k + |y1|dk1 + |yk|d2k + |y2|dk2 y1 =y2 =⇒d21 =d12. Therefore, |y1|+|yk| |y1|+|yk| |y2|+|yk| |y2|+|yk| |y |d1 Ldk |y |d2 Ldk d (D ,D )= |ym| d1 + |y1| dm (9) = k k + 1 + k k + 2 MH m 1 |y |+|y | m |y |+|y | 1 L+|yk| L+|yk| L+|yk| L+|yk| 1 m 1 m = |y ||y+m||y |d1m ≥ L+|yk|y|k|(cid:0)d1k+dk1 +d2k+dk2(cid:1) 1 m ≤ |y ||y+m||y |d12 ≥ L+|yk|y|k|(cid:0)dmk +dkm(cid:1) 1 m ≤d12 = |yk| · |yk|+|ym| L+|y | max(|y |,|y |) |y | |y | k k m = 2 d1+ 1 d1 (cid:18)max(|y |,|y |) max(|y |,|y |) (cid:19) |y1|+|y2| 2 |y1|+|y2| 2 · |y |+k|y m| dmk + |y |+k|y m| dkm |y | |y | k m k m = 2 d1+ 1 d2 |y | |y |+|y | |y1|+|y2| 2 |y1|+|y2| 1 ≥ k · k m L+|y | max(|y |,|y |) =d (D ,D ) k k m MH 1 2 (cid:18) (cid:19) |y | |y | · m dm+ k dk Likewise, dMH(Dm,D2)≤dMH(D1,D2) also holds. |yk|+|ym| k |yk|+|ym| m (cid:18) (cid:19)(cid:20) (cid:21) |y | min(|y |,|y |) Theorem 2: LetDm bethecodewordgeneratedfromD1 = L+k|y | 1+ max(|ym|,|yk|) dMH(Dk,Dm) and D , then ∀ D ∈D , the following inequality holds k m k 2 k MH (cid:44)λ·d (D ,D ) MH k m d (D ,D )+d (D ,D )≥λd (D ,D ), (10) MH k 1 MH k 2 MH k m It is easy to see that λ > 0 (λ (cid:54)= 0 because |y | =(cid:54) 0). k (cid:16) (cid:17)(cid:104) (cid:105) where On the other hand, λ = 1 1+ min(|ym|,|yk|) ≤ (cid:18) |y | (cid:19)(cid:20) min(|y |,|y |)(cid:21) (cid:16) (cid:17)(cid:104) L/(cid:105)|yk|+1 max(|ym|,|yk|) λ= k 1+ m k ∈[0,1] (11) 1 1+ max(|ym|,|yk|) =1. Thus, λ∈[0,1]. L+|yk| max(|ym|,|yk|) L/L+1 max(|ym|,|yk|) Remark 1: Theorem 1 shows that D can be seen as Proof: Firstlet’sconsiderthecodewordswithonlyone m topological centroid of D and D , as depicted in Fig. 4. 1 2 bit. There are two cases: Remark 2: The limits λ ∈ [0,1] in Theorem 2 are con- servative since they do not factor that D ,D are matched 1 2 TABLEII features. In reality, x1 ≈ x2 =⇒ |ym| ≈ L. In practice, DISTANCESOFCODEWORDSWITHONEBIT(CASE#2). Dk will also be generated from a matched pair. Therefore, |y |≈L≈|y | also, to conclude λ≈ 1(1+1)=1. k m 2 D1 D2 Dm Dk dMH(1,k) dMH(2,k) dMH(m,k) Remark 3: Intuitively, Theorem 2 shows that the locality (1,1) (0,1) (0,0) (1,1) 0 1 1 is preserved by using D as a proxy. The theorem can be (1,1) (0,1) (0,0) (1,0) 0 1 0 m (1,1) (0,1) (0,0) (0,1) 1 0 0 interpreted as “if Dk is far away from Dm, then Dk is also (1,1) (0,1) (0,0) (0,0) 1 0 0 far away from D and (or) D ”. This is important to loop- 1 2 Fig.6. Temporalconstraintsrejectsthefalsepositivehypothesisframe54. Inthisexample,thetrueloop-closureframeis356. Fig.7. Diagramofthefinalloop-closuredetectionsystem. Fig. 5. Illustration of intrinsic temporal constraint on the loop-closure hypotheses. hypothesisshouldalsobetriggeredbymatchingframesp+1 with frame k. This results in at least two consecutive frame indices existing in the hypothesis list. closure application: if the matching between two codewords Therefore, we impose a temporal constraint on the loop- D and D is rejected and no loop-closure hypothesis k m closure hypotheses generated for a frame: is triggered, then a real loop-closure is not likely to exist Ahypothesiswithframek isaccepted,ifandonlyifeither because D must be unable to matched with the original k frame k−1 or frame k+1 is also retrieved as a hypothesis. features D ,D up to the factor λ. 1 2 The temporal constraint on hypotheses reduces the false positiverateandleadstoimproveddetectionprecision.Fig.6 D. Temporal Constraints on Loop-closure Hypotheses illustrates an example of using the temporal constraint. True Here we discuss a technique for improving the detection loop-closure exists in frame 356. The hypothesis list with- precision, which comes naturally from the learning process out temporal constraints includes frame 54 (with likelihood of a codeword. As illustrated in Fig. 5, a loop-closure 0.74), frame 356 (with likelihood 0.16), and frame 357 happens around frame p which revisits the same place (withlikelihood0.10).Thetemporalconstraintrejectsframe captured previously by sequence around frame k. A loop- 54 (since neither frame 53 nor 55 are in the hypothesis closure hypothesis closing frame p + 1 with frame k + 1 list),thefalsepositivehypothesis.Afterrenormalization,the is triggered in the bag-of-words framework, because these temporally constrained hypotheses become frame 356 (with twoframesshareplentyofmatchedcodewords.Assumetwo likelihood0.60)andframe357(withlikelihood0.40).Frame codewords D(k+1) and D(p+1) are matched due to the fact m m 356 is therefore retrieved as the final loop-closure index. that (at least) features f and f are strongly matched with k p each other. IV. LOOPCLOSUREDETECTIONSYSTEM If feature f is stable across frames k−1 and k, and it is Ourfinalsystemdesignisbasedthestate-of-the-artsystem k also matched between these two frames, then the codeword IBuILD [12], but with the integration of the proposed algo- D(mk) generated from fk for frame k should also match rithm, as depicted in Fig. 7. The system is an incremental strongly with D(p+1). If there are enough amount of such bag-of-word system without any prior offline training pro- m stable features across frames k − 1 and k, a loop-closure cess. Here we discuss some details of the system. Fig.8. ExampleframesinCityCentredataset. Fig.10. ExampleframesinNew Collegedataset. Fig.9. ExampleframesinMalaga09 6Ldataset. • The initial feature matching is based on raw binary descriptors extracted from visual keypoints (FAST is Fig.11. ExampleframesinFord Campus 2dataset. used). The raw binary descriptors are from the same binarytestswhichareusedfromthecodewordlearning. sweep on the CityCentre set. The experiments on the A local search on the next frame is then performed to other data sets involved modifying the matching threshold find the best matched features. This step would come only. Performance is compared to three other methods: Fab- for free in an actual visual SLAM system. Map 2.0 [9], Bag-of-binary-words [11], and IBuILD [12]. • Masked Hamming distance is used for distance eval- The machine used was a Linux desktop with Intel Core i5 uation in all modules except for the initial feature quadcore 2.8GHz CPU and 8 GB memory. matching. • The codeword generation is based on Algorithm 1. In A. Datasets and evaluation tool the actual implementation, since x and x are already 1 2 The data sets used include four challenging knownfromtheprevioussteps,theyaredirectlyusedto sets: CityCentre [8], Malaga09 6L [4], New generatethemasky insteadoftherawimagepatches m College [20], Ford Campus 2 [17]. Table III I , I . 1 2 summarizes their characteristics. For benchmarking, • Acodewordmergingstepisperformedwhenthelearned theevaluationscriptsandgroundtruthfilesfromtheauthors codewords are used to update the vocabulary. This step of [11] are used. will iterate through all the codewords generated for Figures 8 to 11 visualize some matched feature pairs that the current frame and find the codeword pairs within are used in learning codewords. Notice that in the Ford matching thresholds. These pairs will then be merged Campus 2set,partofthevehicleisalwaysvisible.Wekept accordingtoAlgorithm1buttreatingthetwocodewords only the paired keypoints with pixel coordinate v ≤1200. to merge as D and D . Contrast this merge to [12] 1 2 which takes the “numerical centroid”. For two binary B. Experiment and parameter sweep on CityCentre set vectors, the “numerical centroid” is equivalent to the We chose CityCentre set for a coarse-grain parameter bit-wise OR operation. sweep because it is a difficult dataset to get high recall • The temporal constraint discussed in Section III-D with 100% precision, as will be shown in Table V. The filters the hypothesis list before the loop-closure like- dimensions L of tests and masks are fixed to be 512. The lihood calculations. It is performed with a single-pass parameter sweep evaluated four parameters: (1) matching scan on the hypothesis list. threshold Ψ for d , (2) keypoints detection threshold Υ, MH • The hypothesis with the highest likelihood is output (3) maximum number of matched pairs allowed Γ, and as the final loop-closure index. We perform temporal (4) number of local frames excluded T . Among these local consistency check of k =2 (like [11], Section VI.C). parameters, we observed T can cover a large range, local i.e. 20–50, with little change in the results. Moreover, Γ V. EVALUATION mainlycontrolsthesize/growthofvocabularybylimitingthe Evaluation used benchmark datasets. To get reasonable accepted pairs of initially matched features. Since it trades parameters, we first performed a coarse-grain parameter off efficiency and precision/recall, we set the value to 100. TABLEIII DATASETSUSEDFOREVALUATION Distance Sensor Image Number Dataset Environment (m) position resolution of frames CityCentre [8] Outdoor, urban, dynamic 2025 Lateral 640×480 2474 (Leftandrightcameras) Malaga09 6L [4] Outdoor, slightly dynamic 1192 Frontal 1024×768 869 New College [20] Outdoor, dynamic 2260 Frontal 512×384 5266 Ford Campus 2 [17] Outdoor, urban, slightly dynamic 4004 Frontal 600×1600 1182 Fig. 12. The precision-recall under different detection thresholds on CityCentreset.Eachprecision-recallcurveofacertaindetectionthresh- oldisplottedbychangingthematchingthresholdforthemaskedHamming Fig. 13. The precision-recall on different data sets using the proposed distances. approach. TABLEIV TABLEVI THEPARAMETERSETWHICHGIVESTHEBESTRECALLWITH100% TIME(IN10−6SEC.)USEDFORLEARNINGONECODEWORDUSING PRECISIONONCITYCENTRESET. ALGORITHM1. Parameters Values Stat. Mean Standard Dev. Min Max Matching threshold for d , Ψ 18 MH Time Used 14.60 0.76 13.10 16.04 Keypoints detection threshold, Υ 35 Maximum number of matched pairs allowed, Γ 100 Binary test dimensions, L 512 Number of local frames excluded, Tlocal 20 Ford Campus 2datasetsareusedasthetrainingsetswith parameter searching, while CityCentre and Malaga09 6L are used as testing set with fixed parameters. For Fab- The most important parameters are matching threshold Ψ Map2.0onMalaga09 6L,only462imagesareused[11]. and detection threshold Υ. Figure 12 plots the precision- ItcanbeobservedthatexceptfortheFord Campus 2,our recall curves with Ψ = [8,10,12,15,18,20,22,25] under approach has the highest recalls under 100% precision. Υ = [20,35,50]. The best recall with 100% precision happens when Ψ =18,Υ =35, as listed in Table IV. D. Timing C. Experiments on all four datasets Intheseexperiments,theprecisionandrecallareevaluated We collected the timing statistics for learning a codeword by changing Ψ, but keeping Υ = 35,Γ = 100,T = 20. as in Algorithm 1 from CityCentre experiments. The local The precision-recall curves of our approach is depicted in timing result under normal system process priority is listed Figure 13. inTableVI,withdemonstratesahighefficiencyandstability. Finally, a comparison is provided of our approach to In our implementation, the bit-wise operations are handled the other approaches [9], [11], [12], focusing on the best using C++ std::transform function with bit operation recallunder100%precision.Theresultsofotherapproaches structure (e.g. std::bit xor) for uchar. The number of are directly from the referred publications. In particular, 1sinabinaryvectoriscounteddirectlyusingalook-uptable for bag-of-binary-words method, the New College and indexed by uchar values. TABLEV (USED462IMAGES) Performance (precision/recall) of different approaches Dataset Bag of binary Fab-map 2.0 [9] IBuILD [12] Ours words [11] CityCentre 100% / 38.77% 100% / 30.61% 100% / 38.92% 100% / 41.18% Malaga09 6L 100% / 68.52% 100% / 74.75% 100% / 78.13% 100% / 82.61% New College Not available 100% / 55.92% Not available 100% / 59.20% Ford Campus 2 Not available 100% / 79.45% Not available 100% / 78.92% VI. CONCLUSION [10] J. Dong and S. Soatto. Domain-size pooling in local descriptors: DSP-SIFT. In IEEE Conference on Computer Vision and Pattern This work described a method to learn binary codewords Recognition,pages5097–5106.IEEE,2015. onlineforloop-closuredetection.Thecodewordsarelearned [11] D.Ga´lvez-Lo´pezandJ.Tardo´s. Bagsofbinarywordsforfastplace efficiently in an LDA fashion from matched feature pairs recognition in image sequences. Robotics, IEEE Transactions on, 28(5):1188–1197,Oct2012. in two consecutive frames, such that the learned codewords [12] S.KhanandD.Wollherr. IBuILD:Incrementalbagofbinarywords encode temporal perspective invariance from the observed for appearance based loop closure detection. In IEEE International motion dynamics. The geometric properties of the learned ConferenceonRoboticsandAutomation,pages5441–5447,May2015. [13] K.Konolige,J.Bowman,J.Chen,P.Mihelich,M.Calonder,V.Lepetit, codewords are mathematically justified. The temporal con- andP.Fua. View-basedmaps. TheInternationalJournalofRobotics sistency from the nature of learned codewords is further Research,2010. exploited to cull loop-closure hypotheses. The final incre- [14] S.Leutenegger,M.Chli,andR.Y.Siegwart. BRISK:Binaryrobust invariant scalable keypoints. In IEEE International Conference on mental system is evaluated with precision/recall and timing ComputerVision,pages2548–2555.IEEE,2011. results and demonstrate the effectiveness and efficiency of [15] D.G.Lowe.Distinctiveimagefeaturesfromscale-invariantkeypoints. the approach. Internationaljournalofcomputervision,60(2):91–110,2004. [16] R. Mur-Artal, J. Montiel, and J. D. Tardos. ORB-SLAM: a versatile and accurate monocular SLAM system. arXiv preprint ACKNOWLEDGMENT arXiv:1502.00956,2015. [17] G.Pandey,J.McBride,andR.Eustice. Fordcampusvisionandlidar We’d like to thank Summer Undergraduate Research dataset.TheInternationalJournalofRoboticsResearch,30(13):1543– in Engineering (SURE) grant (NSF award number: EEC- 1552,2011. 1263049) for supporting Mason Lilly. We sincerely thank [18] E.Rublee,V.Rabaud,K.Konolige,andG.Bradski.ORB:anefficient alternative to sift or surf. In IEEE International Conference on the authors of [12] for sharing the main components of his ComputerVision,pages2564–2571.IEEE,2011. IBuILD implementation, and thank the authors of [11] for [19] K. Simonyan, A. Vedaldi, and A. Zisserman. Learning local feature providing their evaluation scripts and ground truth. descriptorsusingconvexoptimisation. IEEETransactionsonPattern AnalysisandMachineIntelligence,36(8):1573–1585,2014. [20] M. Smith, I. Baldwin, W. Churchill, R. Paul, and P. Newman. The REFERENCES new college vision and laser data set. The International Journal of RoboticsResearch,28(5):595–599,2009. [1] A. Angeli, D. Filliat, S. Doncieux, and J.-A. Meyer. Fast and [21] D. Cho, P. Tsiotras, G. Zhang, and M. Holzinger. Robust feature incremental method for loop-closure detection using bags of visual detection, acquisition and tracking for relative navigation in space words. Robotics,IEEETransactionson,24(5):1027–1037,2008. with a known target. In AIAA Guidance, Navigation, and Control [2] V.Balntas,L.Tang,andK.Mikolajczyk.BOLD-binaryonlinelearned Conference,2013. descriptor for efficient image matching. In IEEE Conference on [22] D.Cho,P.Tsiotras,G.Zhang,andM.Holzinger.G.Zhang,P.A.Vela, ComputerVisionandPatternRecognition,pages2367–2375,2015. P.Tsiotras,andD.Cho Efficientclosed-loopdetectionandposeesti- [3] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool. Speeded-up mationforvision-onlyrelativelocalizationinspacewithacooperative robust features (SURF). Computer vision and image understanding, target. InAIAASPACEConferenceandExposition,2014. 110(3):346–359,2008. [23] G. Zhang, M. Kontitsis, N. Filipe, P. Tsiotras, and P.A. Vela. Co- [4] J.Blanco,F.Moreno,andJ.Gonzalez.Acollectionofoutdoorrobotic operative Relative Navigation for Space Rendezvous and Proximity datasets withcentimeter-accuracy groundtruth. Autonomous Robots, Operations using Controlled Active Vision. In Journal of Field 27(4):327–351,2009. Robotics,2015. [5] M. Brown, G. Hua, and S. Winder. Discriminative learning of [24] T.Trzcinski,M.Christoudias,P.Fua,andV.Lepetit. Boostingbinary local image descriptors. IEEE Transactions on Pattern Analysis and keypoint descriptors. In IEEE Conference on Computer Vision and MachineIntelligence,33(1):43–57,2011. PatternRecognition,pages2874–2881.IEEE,2013. [6] H. Cai, K. Mikolajczyk, and J. Matas. Learning linear discrimi- [25] T. Trzcinski and V. Lepetit. Efficient discriminative projections for nant projections for dimensionality reduction of image descriptors. compact binary descriptors. In European Conference on Computer IEEE Transactions on Pattern Analysis and Machine Intelligence, Vision,pages228–242.Springer,2012. 33(2):338–352,2011. [26] S.Winder,G.Hua,andM.Brown. Pickingthebestdaisy. InIEEE [7] M. Calonder, V. Lepetit, C. Strecha, and P. Fua. Brief: Binary ConferenceonComputerVisionandPatternRecognition,pages178– robust independent elementary features. In European Conference on 185.IEEE,2009. ComputerVision,pages778–792.Springer,2010. [27] G.ZhangandP.A.Vela. Goodfeaturestotrackforvisualslam. In [8] M. Cummins and P. Newman. FAB-MAP: Probabilistic localization IEEEConferenceonComputerVisionandPatternRecognition,pages and mapping in the space of appearance. The International Journal 1373–1382,2015. ofRoboticsResearch,27(6):647–665,2008. [28] G.ZhangandP.A.Vela.Optimallyobservableandminimalcardinality [9] M.CumminsandP.Newman. Appearance-onlySLAMatlargescale monocularSLAM.InIEEEInternationalConferenceonRoboticsand withFAB-MAP2.0 TheInternationalJournalofRoboticsResearch, Automation,pages5211–5218,2015. 30(9):1100–1123,2011.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.