Table Of ContentRepresenting 3D models for alignment and recognition
Mathieu Aubry
To cite this version:
MathieuAubry. Representing3Dmodelsforalignmentandrecognition. ComputerVisionandPattern
Recognition [cs.CV]. Ecole normale supérieure - ENS PARIS, 2015. English. NNT: 2015ENSU0006.
tel-01160300v2
HAL Id: tel-01160300
https://theses.hal.science/tel-01160300v2
Submitted on 11 Apr 2018
HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est
archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents
entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non,
lished or not. The documents may come from émanant des établissements d’enseignement et de
teaching and research institutions in France or recherche français ou étrangers, des laboratoires
abroad, or from public or private research centers. publics ou privés.
(cid:1)(cid:1)(cid:1)(cid:2)(cid:2)(cid:3)(cid:3)(cid:6)(cid:6)(cid:7)(cid:7)(cid:8)(cid:8)(cid:9)(cid:9)(cid:7)(cid:7)(cid:8)(cid:8)(cid:10)(cid:10)(cid:11)(cid:11)(cid:12)(cid:12)(cid:13)(cid:13)(cid:11)(cid:11)(cid:14)(cid:14)(cid:15)(cid:15)(cid:13)(cid:13)
(cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)(cid:3)(cid:7)(cid:6)(cid:3)(cid:8)(cid:9)(cid:10)(cid:11)(cid:12)(cid:6)(cid:2)(cid:12)(cid:13)(cid:10)(cid:2)(cid:3)(cid:7)(cid:5)(cid:3)(cid:14)(cid:15)(cid:16)(cid:7)(cid:6)(cid:3)(cid:7)(cid:6)
(cid:17)(cid:18)(cid:19)(cid:20)(cid:1)(cid:21)(cid:22)
(cid:17)(cid:1)(cid:3)(cid:23)(cid:9)(cid:24)(cid:19)(cid:18)(cid:23)(cid:1)(cid:3)(cid:26)(cid:18)(cid:22)(cid:27)(cid:28)(cid:23)(cid:1)(cid:3)(cid:29)(cid:21)(cid:30)(cid:24)(cid:22)(cid:31)(cid:1)(cid:21)(cid:22)(cid:1)
(cid:24)(cid:32)(cid:10)(cid:8)(cid:6)(cid:3)(cid:7)(cid:10)(cid:32)(cid:12)(cid:10)(cid:15)(cid:16)(cid:8)(cid:6)
(cid:3) ED 386 : Sciences mathématiques de Paris Centre
(cid:17)(cid:13)(cid:33)(cid:32)(cid:13)(cid:34)(cid:8)(cid:13)(cid:2)(cid:6)(cid:3)(cid:10)(cid:5)(cid:3)(cid:33)(cid:34)(cid:35)(cid:32)(cid:13)(cid:16)(cid:8)(cid:13)(cid:12)(cid:35)(cid:3)(cid:36)
Informatique
(cid:30)(cid:15)(cid:35)(cid:33)(cid:6)(cid:2)(cid:12)(cid:35)(cid:6)(cid:3)(cid:6)(cid:12)(cid:3)(cid:33)(cid:10)(cid:5)(cid:12)(cid:6)(cid:2)(cid:5)(cid:6)(cid:3)(cid:34)(cid:16)(cid:15)(cid:3)(cid:36)
Mathieu Aubry
(cid:8)(cid:6)(cid:3)8 mai 2015
(cid:20)(cid:13)(cid:12)(cid:15)(cid:6)
Representing 3D models for alignment and recognition
(cid:21)(cid:2)(cid:13)(cid:12)(cid:35)(cid:3)(cid:7)(cid:6)(cid:3)(cid:15)(cid:6)(cid:32)(cid:37)(cid:6)(cid:15)(cid:32)(cid:37)(cid:6)(cid:3) équipe WILLOW (CNRS/ENS/INRIA UMR 8548)
(cid:20)(cid:37)(cid:38)(cid:33)(cid:6)(cid:3)(cid:7)(cid:13)(cid:15)(cid:13)(cid:14)(cid:35)(cid:6)(cid:3)(cid:34)(cid:16)(cid:15)(cid:3) Daniel Cremers (TU Munich)
Josef Sivic (INRIA-ENS)
(cid:3)
(cid:27)(cid:6)(cid:40)(cid:11)(cid:15)(cid:6)(cid:33)(cid:3)(cid:7)(cid:5)(cid:3)(cid:41)(cid:5)(cid:15)(cid:42)(cid:3) (cid:3) (cid:3) (cid:3)
Jean Ponce (ENS) Léonidas Guibas (Stanford)
(cid:3)
Bryan Russell (Adobe) Martial Hebert (CMU)
(cid:3)
(cid:3) (cid:3)
(cid:1)(cid:2)(cid:3)(cid:4)(cid:7)(cid:8)(cid:9)(cid:10)(cid:11)(cid:12)(cid:13)(cid:14)(cid:10)(cid:15)(cid:10)(cid:16)(cid:13)(cid:14)(cid:9)(cid:11)(cid:12)(cid:9)(cid:17)(cid:16)(cid:9)(cid:18)(cid:19)(cid:20)(cid:22)(cid:12)(cid:9)(cid:23)
2
Abstract
Thanks to the success of 3D reconstruction algorithms and the devel-
opment of online tools for computer-aided design (CAD) the number
of publicly available 3D models has grown significantly in recent years,
and will continue to do so. This thesis investigates representations of
3D models for 3D shape matching, instance-level 2D-3D alignment, and
category-level 2D-3D recognition.
The geometry of a 3D shape can be represented almost completely by
the eigen-functions and eigen-values of the Laplace-Beltrami operator on
the shape. We use this mathematically elegant representation to char-
acterize points on the shape, with a new notion of scale. This 3D point
signature can be interpreted in the framework of quantum mechanics and
we call it the Wave Kernel Signature (WKS). We show that it has ad-
vantages with respect to the previous state-of-the-art shape descriptors,
and can be used for 3D shape matching, segmentation and recognition.
— M. Aubry : Representing 3D models for alignment and recognition —
3
A key element for understanding images is the ability to align an object
depicted in an image to its given 3D model. We tackle this instance
level 2D-3D alignment problem for arbitrary 2D depictions including
drawings, paintings, and historical photographs. This is a tremendously
difficult task as the appearance and scene structure in the 2D depic-
tions can be very different from the appearance and geometry of the
3D model, e.g., due to the specific rendering style, drawing error, age,
lighting or change of seasons. We represent the 3D model of an entire
architectural site by a set of visual parts learned from rendered views of
the site. We then develop a procedure to match those scene parts that
we call 3D discriminative visual elements to the 2D depiction of the ar-
chitectural site. We validate our method on a newly collected dataset
of non-photographic and historical depictions of three architectural sites.
We extend this approach to describe not only a single architectural site
but an entire object category, represented by a large collection of 3D
CAD models. We develop a category-level 2D-3D alignment method
that not only detects objects in cluttered images but also identifies their
approximate style and viewpoint. We evaluate our approach both qual-
— M. Aubry : Representing 3D models for alignment and recognition —
4
itatively and quantitatively on a subset of the challenging Pascal VOC
2012 images of the “chair” category using a reference library of 1394
CAD models downloaded from the Internet.
— M. Aubry : Representing 3D models for alignment and recognition —
5
Acknowledgments
I want first to thank all the people I will never forget but cannot list who
inspired, supported or advised me, shared coffees, drinks or thoughts
with me, and made those last years as rich as they were. This thesis
would not be the same without them.
The people I will individually thank are those who made me discover
Computer Vision.
My interest in Vision began with the outstanding Cognitive Science lec-
ture of Jean Petitot who was also the one who advised me to follow the
MVA master. My thanks also go to Renaud Keriven who welcomed me
in the Imagine Group of the ENPC, and sent me to discover the Com-
puter Vision group of Daniel Cremers in Munich and Computer Graphics
at Adobe in Boston.
In Munich Bastian Goldluecke kindly helped me discovering energy min-
imization and writing my first paper. I also had the chance to meet Ul-
rich Schlickewei who believed in the idea WKS from the very beginning,
helped me to develop it and without whom it would not be what it is.
I am very grateful to Sylvain Paris for his warm welcome in Boston,
his guidance and his encouragements to discover photography and Com-
puter Graphics. My thanks also go to Fr´edo Durand who allowed me to
discover MIT Graphics group and whose rigor will remain an inspiration.
I had the chance to have two outstanding PhD advisors. Daniel Cre-
mers built in Munich an impressive, friendly and welcoming group and
I am especially grateful to him for the confidence he showed and gave
me. Josef Sivic shared with me his passion for research and Vision. His
vision, the support and guidance he gave me were invaluable. Thanks
to him I also had the chance to meet and work with Bryan Russell and
Alyosha Efros. Finally, I am deeply indebted to Josef for correcting and
proofreading this thesis.
— M. Aubry : Representing 3D models for alignment and recognition —
6
Last but not least, I am grateful to Leonidas Guibas and Martial Hebert
who agreed to be the “rapporteurs” of this thesis and to Jean Ponce who
agreed to be my thesis committee director.
I cannot finish without mentioning my family, whose support, under-
standing and influence are beyond words, in particular my brothers,
Thomas and Thibaut, my parents, Gabriel and B´eatrice, and my grand-
mother Fran¸coise.
— M. Aubry : Representing 3D models for alignment and recognition —
CONTENTS 7
Contents
1 Introduction 11
1.1 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.1 3D local descriptors . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.2 Instance-level 2D-3D alignment . . . . . . . . . . . . . . . . . . 16
1.3.3 Category-level 2D-3D object recognition . . . . . . . . . . . . . 18
1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4.1 Wave Kernel Signature . . . . . . . . . . . . . . . . . . . . . . . 20
1.4.2 3D discriminative visual elements . . . . . . . . . . . . . . . . . 21
1.5 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.6 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2 Background 25
2.1 3D shape analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.1.1 From the ideal shapes to discrete 3D models . . . . . . . . . . . 26
2.1.2 3D point descriptors . . . . . . . . . . . . . . . . . . . . . . . . 28
2.1.3 3D shape alignment methods . . . . . . . . . . . . . . . . . . . 33
2.2 Instance-level 2D-3D alignment . . . . . . . . . . . . . . . . . . . . . . 36
— M. Aubry : Representing 3D models for alignment and recognition —
CONTENTS 8
2.2.1 Contour-based methods . . . . . . . . . . . . . . . . . . . . . . 36
2.2.2 Local features for alignment . . . . . . . . . . . . . . . . . . . . 40
2.2.3 Global features for alignment . . . . . . . . . . . . . . . . . . . 46
2.2.4 Relationship to our method . . . . . . . . . . . . . . . . . . . . 47
2.3 Category-level 2D-3D alignment . . . . . . . . . . . . . . . . . . . . . . 47
2.3.1 2D methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.3.2 3D methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.3.3 Relationship to our method . . . . . . . . . . . . . . . . . . . . 53
3 Wave Kernel Signature 55
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.1.2 From Quantum Mechanics to shape analysis . . . . . . . . . . . 57
3.1.3 Spectral Methods for shape analysis . . . . . . . . . . . . . . . . 59
3.2 The Wave Kernel Signature . . . . . . . . . . . . . . . . . . . . . . . . 65
3.2.1 From heat diffusion to Quantum Mechanics . . . . . . . . . . . 65
3.2.2 Schr¨odinger equation on a surface . . . . . . . . . . . . . . . . . 67
3.2.3 A spectral signature for shapes . . . . . . . . . . . . . . . . . . 68
3.2.4 Global vs. local WKS . . . . . . . . . . . . . . . . . . . . . . . 70
3.3 Mathematical Analysis of the WKS . . . . . . . . . . . . . . . . . . . . 71
3.3.1 Stability analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.3.2 Spectral analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.3.3 Invariance and discrimination . . . . . . . . . . . . . . . . . . . 76
3.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.4.1 Qualitative analysis . . . . . . . . . . . . . . . . . . . . . . . . . 77
— M. Aubry : Representing 3D models for alignment and recognition —
CONTENTS 9
3.4.2 Quantitative evaluation. . . . . . . . . . . . . . . . . . . . . . . 81
3.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4 Painting-to-3D Alignment 91
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.1.2 From locally invariant to discriminatively trained features . . . 93
4.1.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.2 3D discriminative visual elements . . . . . . . . . . . . . . . . . . . . . 96
4.2.1 Learning 3D discriminative visual elements . . . . . . . . . . . . 97
4.2.2 Matching as classification . . . . . . . . . . . . . . . . . . . . . 98
4.3 Discriminative visual elements for painting-to-3D alignment . . . . . . 100
4.3.1 View selection and representation . . . . . . . . . . . . . . . . . 100
4.3.2 Least squares model for visual element selection and matching . 102
4.3.3 Calibrated discriminative matching . . . . . . . . . . . . . . . . 107
4.3.4 Filtering elements unstable across viewpoint . . . . . . . . . . . 108
4.3.5 Robust matching . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.3.6 Recovering viewpoint . . . . . . . . . . . . . . . . . . . . . . . . 111
4.3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.4 Results and validation . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.4.1 Dataset for painting-to-3D alignment . . . . . . . . . . . . . . . 113
4.4.2 Qualitative results . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.4.3 Quantitative evaluation. . . . . . . . . . . . . . . . . . . . . . . 117
4.4.4 Algorithm analysis . . . . . . . . . . . . . . . . . . . . . . . . . 124
— M. Aubry : Representing 3D models for alignment and recognition —
Description:https://tel.archives-ouvertes.fr/tel-01160300. Submitted on 9 3D models for 3D shape matching, instance-level 2D-3D alignment, and category-level