Imperial College London Department of Computing Local Accuracy and Global Consistency for Efficient Visual SLAM Hauke Strasdat October 2012 Supervised by Dr. Andrew Davison Submitted in part fulfilment of the requirements for the degree of PhD in Computing and the Diploma of Imperial College London. This thesis is entirely my own work, and, except where otherwise indicated, describes my own research. To So-Rim Lee Abstract ThisthesisisconcernedwiththeproblemofSimultaneousLocalisationand Mapping (SLAM) using visual data only. Given the video stream of a moving camera, we wish to estimate the structure of the environment and the motion of the device most accurately and in real-time. Two effective approaches were presented in the past. Filtering methods marginalise out past poses and summarise the information gained over time with a probability distribution. Keyframe methods rely on the optimisation approach of bundle adjustment, but computationally must select only a small number of past frames to process. We perform a rigorous comparison between the two approaches for visual SLAM. Especially, we show that accuracy comes from a large number of points, while the number of intermediate frames only has a minor impact. We conclude that keyframe bundle adjustment is superior to filtering due to a smaller computational cost. Based on these experimental results, we develop an efficient framework for large-scale visual SLAM using the keyframe strategy. We demonstrate that SLAM using a single camera does not only drift in rotation and translation, but also in scale. In particular, we perform large-scale loop closure correction using a novel variant of pose-graph optimisation which also takes scale drift into account. Starting from this two stage approach which tackles local mo- tion estimation and loop closures separately, we develop a unified framework for real-time visual SLAM. By employing a novel double window scheme, we present a constant-time approach which enables the local accuracy of bundle adjustment while ensuring global consistency. Furthermore, we suggest a new scheme for local registration using metric loop closures and present several im- provements for the visual front-end of SLAM. Our contributions are evaluated exhaustivelyonanumberofsyntheticexperimentsandreal-imagedata-setfrom single cameras and range imaging devices. Acknowledgements First and foremost, I would like to express my sincere gratitude to my supervisor Andrew Davison for his enduring support and inspiration. He in- vested lots of time, provided me with helpful comments and critical remarks. I am very thankful to my colleague and unofficial second adviser Jos´e Mar´ıa M. Montiel from Zaragoza. I appreciate all the fruitful discussions we had. Many thanks to Kurt Konolige, whom I visited in autumn 2010 at Willow Garage, for the collaborations during the final stages of my research. I also would like to thank my previous advisors and mentors Martin Riedmiller, Sven Behnke, Cyrill Stachniss and Wolfram Burgard. They sparked my interest in computer vision/roboticsandprovidedmewithfoundationsessentialfordoctoralstudies. ItwasagreatpleasuretodoaPhDinsuchaninspiringandfertileenviron- ment. Iwishtoexpressmygratitudetopastandcurrentmembersandvisitors of the Robot Vision Group. To Ankur Handa and his mathematical skills. To Steven Lovegrove; we had many chats about research, tools, programming languages and more. To Adrien Angeli who aided me with appearance-based loop closure detection. To Margarita Chli, Richard Newcombe, Renato Salas- Moreno,GerardoCarrera,JavierCivera,PabloFernandez,StefanHolzer,Klaus Strobl, Jan Jachnik, Jacek Zienkiewicz and Robert Lukierski. I had the great opportunity to exchange ideas with various experts in the field. Thanks to Ethan Eade concerning the email correspondences about fil- ter implementations and Lie theory. Thanks to Giorgio Grisetti and Rainer Ku¨mmerle for the discussions about efficient optimisation. Thanks to Gabe Sibley and Christopher Mei for discussing visual SLAM and sharing details of their implementations. I treasure the time I spend in London. My thanks go to all my friends and colleagues at Imperial College who made my studies an enjoyable experience. Cheers to the Black Lions. I am very grateful to my family who supported me during my studies and beyond. Contents Contents 1 Introduction 13 1.1 Mobile Robotics and Real-time SLAM . . . . . . . . . . . . . . . . . 14 1.2 Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.3 A Brief Review of Visual SLAM . . . . . . . . . . . . . . . . . . . . 18 1.4 Efficiency, Accuracy and Consistency . . . . . . . . . . . . . . . . . . 23 1.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 1.6 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1.7 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2 Preliminaries 29 2.1 Some Revision of Calculus . . . . . . . . . . . . . . . . . . . . . . . . 29 2.2 Introduction to Optimisation . . . . . . . . . . . . . . . . . . . . . . 31 2.3 Probabilistic State Estimation and Filtering . . . . . . . . . . . . . . 35 2.4 Lie Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3 Monocular Exploration 55 3.1 Monocular SLAM and Exploration . . . . . . . . . . . . . . . . . . . 56 3.2 Camera Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3 Optimization Back-end . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.4 Visual Front-end . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.5 Qualitative Experiment . . . . . . . . . . . . . . . . . . . . . . . . . 79 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 3.7 Bibliographic Remarks . . . . . . . . . . . . . . . . . . . . . . . . . 81 4 Visual SLAM: Why Filter? 85 4.1 Filtering versus Bundle Adjustment . . . . . . . . . . . . . . . . . . 86 8 Contents 4.2 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.3 Preliminary Experiment . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.4 Bundle Adjustment and Filter Variants . . . . . . . . . . . . . . . . 94 4.5 Implementation of Visual SLAM . . . . . . . . . . . . . . . . . . . . 97 4.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 4.8 Bibliographic Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 118 5 Scale Drift-Aware Large Scale Monocular SLAM 123 5.1 Gauge Freedoms, Monocular SLAM and Scale Drift . . . . . . . . . 124 5.2 The Group of Similarity Transformations . . . . . . . . . . . . . . . 126 5.3 Loop Closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 5.6 Bibliographic Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 138 6 Double Window Optimisation 141 6.1 Optimisation for Visual SLAM . . . . . . . . . . . . . . . . . . . . . 143 6.2 Double Window Optimisation Framework . . . . . . . . . . . . . . . 146 6.3 Visual Frontends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 6.4 Loop Closures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 6.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 6.6 Discussion and Summary . . . . . . . . . . . . . . . . . . . . . . . . 171 6.7 Bibliographic Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 172 7 Conclusion 177 7.1 Discussion and Future Work . . . . . . . . . . . . . . . . . . . . . . . 178 A Proofs and Formulae related to Lie Groups 181 A.1 Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 A.2 Adjoint Representations . . . . . . . . . . . . . . . . . . . . . . . . . 182 A.3 Lie brackets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 A.4 The Campbell-Baker-Hausdorff Formula . . . . . . . . . . . . . . . . 185 A.5 Exponential Map onto Sim(3) . . . . . . . . . . . . . . . . . . . . . 186 A.6 Derivative of the Lie Logarithm . . . . . . . . . . . . . . . . . . . . . 189 B Jacobians 191 9 Contents B.1 Projections and Camera Forward Models . . . . . . . . . . . . . . . 191 B.2 Pose-Point Transformations . . . . . . . . . . . . . . . . . . . . . . . 192 B.3 Inverse Depth Point Transformations . . . . . . . . . . . . . . . . . . 193 B.4 Bundle Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 B.5 Anchored Inverse Depth Bundle Adjustment. . . . . . . . . . . . . . 194 B.6 Pose-graph Optimisation . . . . . . . . . . . . . . . . . . . . . . . . . 195 Bibliography 197 10
Description: