Representing 3D models for alignment and recognition Mathieu Aubry To cite this version: MathieuAubry. Representing3Dmodelsforalignmentandrecognition. ComputerVisionandPattern Recognition [cs.CV]. Ecole normale supérieure - ENS PARIS, 2015. English. NNT: 2015ENSU0006. tel-01160300v2 HAL Id: tel-01160300 https://theses.hal.science/tel-01160300v2 Submitted on 11 Apr 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. (cid:1)(cid:1)(cid:1)(cid:2)(cid:2)(cid:3)(cid:3)(cid:6)(cid:6)(cid:7)(cid:7)(cid:8)(cid:8)(cid:9)(cid:9)(cid:7)(cid:7)(cid:8)(cid:8)(cid:10)(cid:10)(cid:11)(cid:11)(cid:12)(cid:12)(cid:13)(cid:13)(cid:11)(cid:11)(cid:14)(cid:14)(cid:15)(cid:15)(cid:13)(cid:13) (cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)(cid:3)(cid:7)(cid:6)(cid:3)(cid:8)(cid:9)(cid:10)(cid:11)(cid:12)(cid:6)(cid:2)(cid:12)(cid:13)(cid:10)(cid:2)(cid:3)(cid:7)(cid:5)(cid:3)(cid:14)(cid:15)(cid:16)(cid:7)(cid:6)(cid:3)(cid:7)(cid:6) (cid:17)(cid:18)(cid:19)(cid:20)(cid:1)(cid:21)(cid:22) (cid:17)(cid:1)(cid:3)(cid:23)(cid:9)(cid:24)(cid:19)(cid:18)(cid:23)(cid:1)(cid:3)(cid:26)(cid:18)(cid:22)(cid:27)(cid:28)(cid:23)(cid:1)(cid:3)(cid:29)(cid:21)(cid:30)(cid:24)(cid:22)(cid:31)(cid:1)(cid:21)(cid:22)(cid:1) (cid:24)(cid:32)(cid:10)(cid:8)(cid:6)(cid:3)(cid:7)(cid:10)(cid:32)(cid:12)(cid:10)(cid:15)(cid:16)(cid:8)(cid:6) (cid:3) ED 386 : Sciences mathématiques de Paris Centre (cid:17)(cid:13)(cid:33)(cid:32)(cid:13)(cid:34)(cid:8)(cid:13)(cid:2)(cid:6)(cid:3)(cid:10)(cid:5)(cid:3)(cid:33)(cid:34)(cid:35)(cid:32)(cid:13)(cid:16)(cid:8)(cid:13)(cid:12)(cid:35)(cid:3)(cid:36) Informatique (cid:30)(cid:15)(cid:35)(cid:33)(cid:6)(cid:2)(cid:12)(cid:35)(cid:6)(cid:3)(cid:6)(cid:12)(cid:3)(cid:33)(cid:10)(cid:5)(cid:12)(cid:6)(cid:2)(cid:5)(cid:6)(cid:3)(cid:34)(cid:16)(cid:15)(cid:3)(cid:36) Mathieu Aubry (cid:8)(cid:6)(cid:3)8 mai 2015 (cid:20)(cid:13)(cid:12)(cid:15)(cid:6) Representing 3D models for alignment and recognition (cid:21)(cid:2)(cid:13)(cid:12)(cid:35)(cid:3)(cid:7)(cid:6)(cid:3)(cid:15)(cid:6)(cid:32)(cid:37)(cid:6)(cid:15)(cid:32)(cid:37)(cid:6)(cid:3) équipe WILLOW (CNRS/ENS/INRIA UMR 8548) (cid:20)(cid:37)(cid:38)(cid:33)(cid:6)(cid:3)(cid:7)(cid:13)(cid:15)(cid:13)(cid:14)(cid:35)(cid:6)(cid:3)(cid:34)(cid:16)(cid:15)(cid:3) Daniel Cremers (TU Munich) Josef Sivic (INRIA-ENS) (cid:3) (cid:27)(cid:6)(cid:40)(cid:11)(cid:15)(cid:6)(cid:33)(cid:3)(cid:7)(cid:5)(cid:3)(cid:41)(cid:5)(cid:15)(cid:42)(cid:3) (cid:3) (cid:3) (cid:3) Jean Ponce (ENS) Léonidas Guibas (Stanford) (cid:3) Bryan Russell (Adobe) Martial Hebert (CMU) (cid:3) (cid:3) (cid:3) (cid:1)(cid:2)(cid:3)(cid:4)(cid:7)(cid:8)(cid:9)(cid:10)(cid:11)(cid:12)(cid:13)(cid:14)(cid:10)(cid:15)(cid:10)(cid:16)(cid:13)(cid:14)(cid:9)(cid:11)(cid:12)(cid:9)(cid:17)(cid:16)(cid:9)(cid:18)(cid:19)(cid:20)(cid:22)(cid:12)(cid:9)(cid:23) 2 Abstract Thanks to the success of 3D reconstruction algorithms and the devel- opment of online tools for computer-aided design (CAD) the number of publicly available 3D models has grown significantly in recent years, and will continue to do so. This thesis investigates representations of 3D models for 3D shape matching, instance-level 2D-3D alignment, and category-level 2D-3D recognition. The geometry of a 3D shape can be represented almost completely by the eigen-functions and eigen-values of the Laplace-Beltrami operator on the shape. We use this mathematically elegant representation to char- acterize points on the shape, with a new notion of scale. This 3D point signature can be interpreted in the framework of quantum mechanics and we call it the Wave Kernel Signature (WKS). We show that it has ad- vantages with respect to the previous state-of-the-art shape descriptors, and can be used for 3D shape matching, segmentation and recognition. — M. Aubry : Representing 3D models for alignment and recognition — 3 A key element for understanding images is the ability to align an object depicted in an image to its given 3D model. We tackle this instance level 2D-3D alignment problem for arbitrary 2D depictions including drawings, paintings, and historical photographs. This is a tremendously difficult task as the appearance and scene structure in the 2D depic- tions can be very different from the appearance and geometry of the 3D model, e.g., due to the specific rendering style, drawing error, age, lighting or change of seasons. We represent the 3D model of an entire architectural site by a set of visual parts learned from rendered views of the site. We then develop a procedure to match those scene parts that we call 3D discriminative visual elements to the 2D depiction of the ar- chitectural site. We validate our method on a newly collected dataset of non-photographic and historical depictions of three architectural sites. We extend this approach to describe not only a single architectural site but an entire object category, represented by a large collection of 3D CAD models. We develop a category-level 2D-3D alignment method that not only detects objects in cluttered images but also identifies their approximate style and viewpoint. We evaluate our approach both qual- — M. Aubry : Representing 3D models for alignment and recognition — 4 itatively and quantitatively on a subset of the challenging Pascal VOC 2012 images of the “chair” category using a reference library of 1394 CAD models downloaded from the Internet. — M. Aubry : Representing 3D models for alignment and recognition — 5 Acknowledgments I want first to thank all the people I will never forget but cannot list who inspired, supported or advised me, shared coffees, drinks or thoughts with me, and made those last years as rich as they were. This thesis would not be the same without them. The people I will individually thank are those who made me discover Computer Vision. My interest in Vision began with the outstanding Cognitive Science lec- ture of Jean Petitot who was also the one who advised me to follow the MVA master. My thanks also go to Renaud Keriven who welcomed me in the Imagine Group of the ENPC, and sent me to discover the Com- puter Vision group of Daniel Cremers in Munich and Computer Graphics at Adobe in Boston. In Munich Bastian Goldluecke kindly helped me discovering energy min- imization and writing my first paper. I also had the chance to meet Ul- rich Schlickewei who believed in the idea WKS from the very beginning, helped me to develop it and without whom it would not be what it is. I am very grateful to Sylvain Paris for his warm welcome in Boston, his guidance and his encouragements to discover photography and Com- puter Graphics. My thanks also go to Fr´edo Durand who allowed me to discover MIT Graphics group and whose rigor will remain an inspiration. I had the chance to have two outstanding PhD advisors. Daniel Cre- mers built in Munich an impressive, friendly and welcoming group and I am especially grateful to him for the confidence he showed and gave me. Josef Sivic shared with me his passion for research and Vision. His vision, the support and guidance he gave me were invaluable. Thanks to him I also had the chance to meet and work with Bryan Russell and Alyosha Efros. Finally, I am deeply indebted to Josef for correcting and proofreading this thesis. — M. Aubry : Representing 3D models for alignment and recognition — 6 Last but not least, I am grateful to Leonidas Guibas and Martial Hebert who agreed to be the “rapporteurs” of this thesis and to Jean Ponce who agreed to be my thesis committee director. I cannot finish without mentioning my family, whose support, under- standing and influence are beyond words, in particular my brothers, Thomas and Thibaut, my parents, Gabriel and B´eatrice, and my grand- mother Fran¸coise. — M. Aubry : Representing 3D models for alignment and recognition — CONTENTS 7 Contents 1 Introduction 11 1.1 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.3 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.3.1 3D local descriptors . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.3.2 Instance-level 2D-3D alignment . . . . . . . . . . . . . . . . . . 16 1.3.3 Category-level 2D-3D object recognition . . . . . . . . . . . . . 18 1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.4.1 Wave Kernel Signature . . . . . . . . . . . . . . . . . . . . . . . 20 1.4.2 3D discriminative visual elements . . . . . . . . . . . . . . . . . 21 1.5 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.6 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2 Background 25 2.1 3D shape analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.1.1 From the ideal shapes to discrete 3D models . . . . . . . . . . . 26 2.1.2 3D point descriptors . . . . . . . . . . . . . . . . . . . . . . . . 28 2.1.3 3D shape alignment methods . . . . . . . . . . . . . . . . . . . 33 2.2 Instance-level 2D-3D alignment . . . . . . . . . . . . . . . . . . . . . . 36 — M. Aubry : Representing 3D models for alignment and recognition — CONTENTS 8 2.2.1 Contour-based methods . . . . . . . . . . . . . . . . . . . . . . 36 2.2.2 Local features for alignment . . . . . . . . . . . . . . . . . . . . 40 2.2.3 Global features for alignment . . . . . . . . . . . . . . . . . . . 46 2.2.4 Relationship to our method . . . . . . . . . . . . . . . . . . . . 47 2.3 Category-level 2D-3D alignment . . . . . . . . . . . . . . . . . . . . . . 47 2.3.1 2D methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.3.2 3D methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 2.3.3 Relationship to our method . . . . . . . . . . . . . . . . . . . . 53 3 Wave Kernel Signature 55 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.1.2 From Quantum Mechanics to shape analysis . . . . . . . . . . . 57 3.1.3 Spectral Methods for shape analysis . . . . . . . . . . . . . . . . 59 3.2 The Wave Kernel Signature . . . . . . . . . . . . . . . . . . . . . . . . 65 3.2.1 From heat diffusion to Quantum Mechanics . . . . . . . . . . . 65 3.2.2 Schr¨odinger equation on a surface . . . . . . . . . . . . . . . . . 67 3.2.3 A spectral signature for shapes . . . . . . . . . . . . . . . . . . 68 3.2.4 Global vs. local WKS . . . . . . . . . . . . . . . . . . . . . . . 70 3.3 Mathematical Analysis of the WKS . . . . . . . . . . . . . . . . . . . . 71 3.3.1 Stability analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.3.2 Spectral analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.3.3 Invariance and discrimination . . . . . . . . . . . . . . . . . . . 76 3.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.4.1 Qualitative analysis . . . . . . . . . . . . . . . . . . . . . . . . . 77 — M. Aubry : Representing 3D models for alignment and recognition — CONTENTS 9 3.4.2 Quantitative evaluation. . . . . . . . . . . . . . . . . . . . . . . 81 3.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4 Painting-to-3D Alignment 91 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.1.2 From locally invariant to discriminatively trained features . . . 93 4.1.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.2 3D discriminative visual elements . . . . . . . . . . . . . . . . . . . . . 96 4.2.1 Learning 3D discriminative visual elements . . . . . . . . . . . . 97 4.2.2 Matching as classification . . . . . . . . . . . . . . . . . . . . . 98 4.3 Discriminative visual elements for painting-to-3D alignment . . . . . . 100 4.3.1 View selection and representation . . . . . . . . . . . . . . . . . 100 4.3.2 Least squares model for visual element selection and matching . 102 4.3.3 Calibrated discriminative matching . . . . . . . . . . . . . . . . 107 4.3.4 Filtering elements unstable across viewpoint . . . . . . . . . . . 108 4.3.5 Robust matching . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.3.6 Recovering viewpoint . . . . . . . . . . . . . . . . . . . . . . . . 111 4.3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 4.4 Results and validation . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.4.1 Dataset for painting-to-3D alignment . . . . . . . . . . . . . . . 113 4.4.2 Qualitative results . . . . . . . . . . . . . . . . . . . . . . . . . 114 4.4.3 Quantitative evaluation. . . . . . . . . . . . . . . . . . . . . . . 117 4.4.4 Algorithm analysis . . . . . . . . . . . . . . . . . . . . . . . . . 124 — M. Aubry : Representing 3D models for alignment and recognition —
Description: