ebook img

Semantic transfer with deep neural networks PDF

133 Pages·2017·11.64 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Semantic transfer with deep neural networks

UC San Diego UC San Diego Electronic Theses and Dissertations Title Semantic transfer with deep neural networks Permalink https://escholarship.org/uc/item/61v536xt Author Dixit, Mandar Publication Date 2017 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA, SAN DIEGO Semantic transfer with deep neural networks A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Electrical Engineering (Intelligent Systems, Robotics, and Control) by Mandar Dixit Committee in charge: Professor Nuno Vasconcelos, Chair Professor Manmohan Chandraker Professor Kenneth Kreutz-Delgado Professor Gert Lanckriet Professor Zhuowen Tu 2017 Copyright Mandar Dixit, 2017 All rights reserved. The dissertation of Mandar Dixit is approved, and it is acceptable in quality and form for publication on micro- film and electronically: Chair University of California, San Diego 2017 iii DEDICATION Dedicated to my mother iv TABLE OF CONTENTS SignaturePage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv TableofConten.ts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v ListofFigures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii ListofTables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Acknowledgeme.n.ts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Vita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Chapter I Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 I.A. Visual Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 2 I.B. Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 I.C. Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 I.C.1. Object-to-Scene Semantic Transfer . . . . . . . . . . . . . . 5 I.C.2. Semantic Covariance Modeling . . . . . . . . . . . . . . . . . 6 I.C.3. Attribute Guided Data augmentation . . . . . . . . . . . . . 7 I.D. Organization of the thesis . . . . . . . . . . . . . . . . . . . . . . 8 Chapter II Semantic Image Representations for Object to Scene Transfer . 9 II.A. Scene Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 10 II.B. Image representations . . . . . . . . . . . . . . . . . . . . . . . . . 12 II.B.1. Bag of Features . . . . . . . . . . . . . . . . . . . . . . . . . 12 II.B.2. Bag of Semantics . . . . . . . . . . . . . . . . . . . . . . . . 13 II.C. BoF embeddings . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 II.C.1. CNN embedding . . . . . . . . . . . . . . . . . . . . . . . . . 15 II.C.2. FV embedding . . . . . . . . . . . . . . . . . . . . . . . . . . 15 II.C.3. Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 II.D. Semantic FV embedding . . . . . . . . . . . . . . . . . . . . . . . 18 II.D.1. The BoS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 II.D.2. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 II.D.3. Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 II.D.4. Natural parameter space . . . . . . . . . . . . . . . . . . . . 23 II.D.5. Natural Parameter FVs . . . . . . . . . . . . . . . . . . . . . 24 II.D.6. Dirichlet Mixture FVs . . . . . . . . . . . . . . . . . . . . . 25 II.D.7. Evaluation of NP Embeddings . . . . . . . . . . . . . . . . . 26 v II.D.8. Ablation Analysis . . . . . . . . . . . . . . . . . . . . . . . . 28 II.D.9. NP-BoS v.s. Comparable Transformations . . . . . . . . . . 33 II.E. Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 II.E.1. FVs of lower layer activations . . . . . . . . . . . . . . . . . 35 II.E.2. Fine Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 II.E.3. The Places CNN . . . . . . . . . . . . . . . . . . . . . . . . 36 II.F. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 II.F.1. Experimental setup . . . . . . . . . . . . . . . . . . . . . . . 37 II.F.2. The role of invariance . . . . . . . . . . . . . . . . . . . . . . 38 II.F.3. Comparison to the state of the art . . . . . . . . . . . . . . . 39 II.G. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 II.H. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Chapter III Improving Transfer with semantic covariance model . . . . . . . 42 III.A.Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 III.B.Fisher scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 III.B.1.Fisher Scores from EM . . . . . . . . . . . . . . . . . . . . . 45 III.B.2.Bag of features . . . . . . . . . . . . . . . . . . . . . . . . . 47 III.B.3.Gaussian Mixture Fisher Vectors . . . . . . . . . . . . . . . 47 III.B.4.Fisher Scores for the Mixture of Factor Analyzers . . . . . . 48 III.B.5.Fisher Information . . . . . . . . . . . . . . . . . . . . . . . 50 III.B.6.Evaluating MFA embeddings . . . . . . . . . . . . . . . . . . 51 III.C.Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 III.C.1.Gradients based on Sparse Coding . . . . . . . . . . . . . . . 57 III.C.2.Bilinear Pooling . . . . . . . . . . . . . . . . . . . . . . . . . 58 III.D.Scene Image Classification . . . . . . . . . . . . . . . . . . . . . . 58 III.D.1.Multi-scale learning and Deep CNNs . . . . . . . . . . . . . 59 III.D.2.Comparison with ImageNet based Classifiers . . . . . . . . . 60 III.D.3.Task transfer performance . . . . . . . . . . . . . . . . . . . 62 III.E.Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 III.F.Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Chapter IV Attribute Trajectory Transfer for Data Augmentation . . . . . . 66 IV.A.Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 IV.B.Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 IV.C.Attribute Trajectory Transfer . . . . . . . . . . . . . . . . . . . . 72 IV.D.Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 IV.D.1.Attribute regression . . . . . . . . . . . . . . . . . . . . . . . 75 IV.D.2.Feature regression . . . . . . . . . . . . . . . . . . . . . . . . 76 IV.E.Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 IV.E.1. Attribute regression . . . . . . . . . . . . . . . . . . . . . . . 85 IV.E.2. Feature regression . . . . . . . . . . . . . . . . . . . . . . . . 87 IV.E.3. One-shot object recognition . . . . . . . . . . . . . . . . . . 88 IV.E.4. Object-based one-shot scene recognition . . . . . . . . . . . 92 vi IV.F. Analyzing AGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 IV.G.Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 IV.H.Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Chapter V Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 vii LIST OF FIGURES Figure I.1 Types of Transfer Learning . . . . . . . . . . . . . . . . . . 3 Figure II.1 The Bag of Features (BoF) embedding . . . . . . . . . . . . 12 Figure II.2 The Bag of Semantics (BoS) embedding . . . . . . . . . . . 15 Figure II.3 Example of an imageNet based Bag of Semantics . . . . . . 18 Figure II.4 Convolutional Neural Net based semantic image represen- tation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Figure II.5 Effects of probability mapping on Euclidean spaces . . . . . 27 Figure II.6 The scene classification performance of a DMM-FV varying with the number of mixture components. . . . . . . . . . . 29 Figure III.1 Performance of Latent space statistics of (III.12) for differ- ent latent space dimensions. . . . . . . . . . . . . . . . . . . 52 Figure III.2 ComparisonofMFA-FSobtainedwithdifferentmixturemod- els. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Figure IV.1 Anillustrationoftheproposedtrajectorylearningandtrans- fer method. . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Figure IV.2 Schematic illustration of data augmentation by transferring an approximated attribute trajectory (here, for variation in object pose). . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Figure IV.3 Distinction between class instances in standard natural im- age datasets (top) and image trajectories (bottom). . . . . . 75 Figure IV.4 Architecture of the attribute regressor γ. . . . . . . . . . . 76 Figure IV.5 Illustration of attribute-guided trajectory approximation . . 77 Figure IV.6 Illustration of the encoder-decoder network for AGA. . . . . 79 Figure IV.7 Illustration of training data generation for AGA training . . 83 Figure IV.8 Visualizing gradients generated by the network to affect depth change . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Figure IV.9 One-shot regularization experiment where the weight α is varied from 0.01 to 0.99. . . . . . . . . . . . . . . . . . . . . 96 Figure IV.10 One-shot and five-shot recognition with only augmented data (original omitted) . . . . . . . . . . . . . . . . . . . . 98 Figure IV.11 Performance of few shot (1-10) object recognition on us- 1 T ing AGA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Figure V.1 Map of CNN based transfer learning: across dataset, class and domains . . . . . . . . . . . . . . . . . . . . . . . . . . 104 viii LIST OF TABLES Table II.1 Comparison of the ImageNET CNN and FV embeddings on scene and object classification tasks. . . . . . . . . . . . . . 17 Table II.2 Evaluation of different Fisher vector encoding techniques over imageNet BoS. . . . . . . . . . . . . . . . . . . . . . . 21 Table II.3 Impact of codebook selection on DMM and GMM FVs in ν(1) space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Table II.4 Ablation analysis of the DMM-FV and the GMM-FV em- beddings on ν(1) space. . . . . . . . . . . . . . . . . . . . . 33 Table II.5 Impact of semantic feature extraction at different scales. . . 37 Table II.6 Comparison with the state-of-the-art methods using Ima- geNET trained features. . . . . . . . . . . . . . . . . . . . . 40 Table II.7 Comparison with a CNN trained on Scenes [96] . . . . . . . 40 Table III.1 Comparison of MFA-FS with semantic “gist” embeddings learned using ImageNet BoS and the Places dataset. . . . . 55 Table III.2 Classification accuracy (K = 50,R = 10). . . . . . . . . . . 59 Table III.3 Classification accuracy vs. descriptor size for MFA-FS(Λ) and a comparable GMM-FV(σ) . . . . . . . . . . . . . . . . 59 Table III.4 MFA-FS classification accuracy as a function of patch scale. 60 Table III.5 Comparison with scene classification state-of-the-art . . . . 61 Table III.6 ComparisontotasktransfermethodsusingImageNetCNNs on MIT Indoor. . . . . . . . . . . . . . . . . . . . . . . . . . 62 Table III.7 Comparison with the Places trained Scene CNNs. . . . . . . 63 Table III.8 Classification accuracy (K = 50,R = 10). . . . . . . . . . . 64 Table IV.1 Median-Absolute-Error (MAE), for depth / pose, of the at- tribute regressor, evaluated on left-out (during training) in- stances of 19 objects from SUN RGBD. . . . . . . . . . . . 86 Table IV.2 Quality assessment of feature regressor φk. . . . . . . . . . . 89 i Table IV.3 One / few shot recognition accuracy (over 500 trials) for three object recognition tasks . . . . . . . . . . . . . . . . . 91 Table IV.4 One-shot classification on indoor scenes from [65]. . . . . . 93 Table IV.5 Retrieval results for unseen objects ( ) when querying with 1 T synthesized features of varying depth. . . . . . . . . . . . . 94 ix

Description:
Figure III.1 Performance of Latent space statistics of (III.12) for differ- . in particular, to Aman Bhatia, Siddharth Joshi and Joshal Daftari with whom I [28] J. Gibbons and S. Chakraborti, Nonparametric Statistical Inference, 5th ed.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.