Hamed Habibi Aghdam Elnaz Jahani Heravi Guide to Convolutional Neural Networks A Practical Application to Traffic-Sign Detection and Classification Guide to Convolutional Neural Networks Hamed Habibi Aghdam Elnaz Jahani Heravi Guide to Convolutional Neural Networks A Practical Application to Traffic-Sign Detection and Classification 123 Hamed HabibiAghdam ElnazJahani Heravi University Rovira iVirgili University Rovira iVirgili Tarragona Tarragona Spain Spain ISBN978-3-319-57549-0 ISBN978-3-319-57550-6 (eBook) DOI 10.1007/978-3-319-57550-6 LibraryofCongressControlNumber:2017938310 ©SpringerInternationalPublishingAG2017 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpart of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission orinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar methodologynowknownorhereafterdeveloped. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfrom therelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authorsortheeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinor for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictionalclaimsinpublishedmapsandinstitutionalaffiliations. Printedonacid-freepaper ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerInternationalPublishingAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland To my wife, Elnaz, who possess the most accurate and reliable optimization method and guides me toward global optima of life. Hamed Habibi Aghdam Preface Generalparadigminsolvingacomputervisionproblemistorepresentarawimage usingamoreinformativevectorcalledfeaturevectorandtrainaclassifierontopof featurevectorscollectedfromtrainingset.Fromclassificationperspective,thereare severaloff-the-shelfmethods suchasgradient boosting,random forest andsupport vector machines that are able to accurately model nonlinear decision boundaries. Hence,solvingacomputervisionproblemmainlydependsonthefeatureextraction algorithm. Feature extraction methods such as scale invariant feature transform, histogram oforientedgradients,bankofGaborfilters,localbinarypattern,bagoffeaturesand Fisher vectors are some of the methods that performed well compared with their predecessors. These methods mainly create the feature vector in several steps. For example,scale invariant featuretransform andhistogram oforientedgradientsfirst computegradientoftheimage.Then,they poolgradientmagnitudesoverdifferent regions and concatenate them in order to create the final feature vector. Similarly, bag of feature and Fisher vectors start with extracting a feature vector such as histogram of oriented gradient on regions around bunch salient points on image. Then,thesefeaturesarepooledagaininordertocreatehigherlevelfeaturevectors. Despite the great efforts in computer vision community, the above hand-engineered features were not able to properly model large classes of natural objects. Advent of convolutional neural networks, large datasets and parallel computing hardware changed the course of computer vision. Instead of designing feature vectors by hand, convolutional neural networks learn a composite feature transformation function that makes classes of objects linearly separable in the feature space. Recently,convolutionalneuralnetworkshavesurpassedhumanindifferenttasks suchasclassificationofnaturalobjectsandclassificationoftrafficsigns.Aftertheir great success, convolutional neural networks have become the first choice for learning features from training data. One of the fields that have been greatly influenced by convolutional neural networks is automotive industry. Tasks such aspedestrian detection, car detection, traffic sign recognition, traffic light recognition and road scene understanding are rarely done using hand-crafted features anymore. vii viii Preface Designing, implementing and evaluating are crucial steps in developing a suc- cessful computer vision-based method. In order to design a neural network, one musthavethebasicknowledgeabouttheunderlyingprocessofneuralnetworkand training algorithms. Implementing a neural network requires a deep knowledge aboutlibrariesthatcanbeusedforthispurpose.Moreover,neuralnetworkmustbe evaluated quantitatively and qualitatively before using them in practical applications. Instead of going into details of mathematical concepts, this book tries to ade- quately explain fundamentals of neural network and show how to implement and assess them in practice. Specifically, Chap. 2 covers basic concepts related to classificationanditderivestheideaoffeaturelearningusingneuralnetworkstarting from linear classifiers. Then, Chap. 3 shows how to derive convolutional neural networks from fully connected neural networks. It also reviews classical network architectures and mentions different techniques for evaluating neural networks. Next, Chap. 4 thoroughly talks about a practical library for implementing con- volutional neural networks. It also explains how to use Python interface of this library in order to create and evaluate neural networks. The next two chapters explain practical examples about detection and classification of traffic signs using convolutionalneuralnetworks.Finally,thelastchapterintroducesafewtechniques for visualizing neural networks using Python interface. Graduate/undergraduatestudentsaswellasmachinevisionpractitionerscanuse thebooktogainahand-onknowledgeinthefieldofconvolutionalneuralnetworks. Exercises have been designed such that they will help readers to acquire deeper knowledge in the field. Last but not least, Python scripts have been provided so reader will be able to reproduce the results and practice the topics of this book easily. Books Website Mostofcodesexplainedinthisbookareavailableinhttps://github.com/pcnn/.The codes are written in Python 2.7 and they require numpy and matplotlib libraries. You can download and try the codes on your own. Tarragona, Spain Hamed Habibi Aghdam Contents 1 Traffic Sign Detection and Recognition. . . . . . . . . . . . . . . . . . . . . . 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.1 Template Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.2 Hand-Crafted Features . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.3 Feature Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.4 ConvNets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2 Pattern Classification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1.1 K-Nearest Neighbor. . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Linear Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2.1 Training a Linear Classifier. . . . . . . . . . . . . . . . . . . . . . 22 2.2.2 Hinge Loss. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.2.3 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.2.4 Comparing Loss Function. . . . . . . . . . . . . . . . . . . . . . . 37 2.3 Multiclass Classification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.3.1 One Versus One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.3.2 One Versus Rest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.3.3 Multiclass Hinge Loss . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.3.4 Multinomial Logistic Function . . . . . . . . . . . . . . . . . . . 48 2.4 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.5 Learning UðxÞ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 2.6 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 2.6.1 Backpropagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 2.6.2 Activation Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 71 2.6.3 Role of Bias. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 2.6.4 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 2.6.5 How to Apply on Images. . . . . . . . . . . . . . . . . . . . . . . 79 ix x Contents 2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 2.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 3.1 Deriving Convolution from a Fully Connected Layer. . . . . . . . . 85 3.1.1 Role of Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . 90 3.1.2 Backpropagation of Convolution Layers. . . . . . . . . . . . . 92 3.1.3 Stride in Convolution. . . . . . . . . . . . . . . . . . . . . . . . . . 94 3.2 Pooling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 3.2.1 Backpropagation in Pooling Layer. . . . . . . . . . . . . . . . . 97 3.3 LeNet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 3.4 AlexNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 3.5 Designing a ConvNet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 3.5.1 ConvNet Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . 102 3.5.2 Software Libraries. . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 3.5.3 Evaluating a ConvNet . . . . . . . . . . . . . . . . . . . . . . . . . 105 3.6 Training a ConvNet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 3.6.1 Loss Function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 3.6.2 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 3.6.3 Regularization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 3.6.4 Learning Rate Annealing . . . . . . . . . . . . . . . . . . . . . . . 121 3.7 Analyzing Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . . 124 3.8 Other Types of Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 3.8.1 Local Response Normalization . . . . . . . . . . . . . . . . . . . 126 3.8.2 Spatial Pyramid Pooling. . . . . . . . . . . . . . . . . . . . . . . . 127 3.8.3 Mixed Pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 3.8.4 Batch Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . 127 3.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 3.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 4 Caffe Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 4.2 Installing Caffe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 4.3 Designing Using Text Files. . . . . . . . . . . . . . . . . . . . . . . . . . . 132 4.3.1 Providing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 4.3.2 Convolution Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 4.3.3 Initializing Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 141 4.3.4 Activation Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 4.3.5 Pooling Layer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 4.3.6 Fully Connected Layer. . . . . . . . . . . . . . . . . . . . . . . . . 145 4.3.7 Dropout Layer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 4.3.8 Classification and Loss Layers . . . . . . . . . . . . . . . . . . . 146 Contents xi 4.4 Training a Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 4.5 Designing in Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 4.6 Drawing Architecture of Network . . . . . . . . . . . . . . . . . . . . . . 157 4.7 Training Using Python. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 4.8 Evaluating Using Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 4.9 Save and Restore Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . 161 4.10 Python Layer in Caffe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 4.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 4.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 Reference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 5 Classification of Traffic Signs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 5.2.1 Template Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 5.2.2 Hand-Crafted Features . . . . . . . . . . . . . . . . . . . . . . . . . 170 5.2.3 Sparse Coding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 5.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 5.2.5 ConvNets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 5.3 Preparing Dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 5.3.1 Splitting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 5.3.2 Augmenting Dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . 177 5.3.3 Static Versus One-the-Fly Augmenting. . . . . . . . . . . . . . 185 5.3.4 Imbalanced Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 5.3.5 Preparing the GTSRB Dataset. . . . . . . . . . . . . . . . . . . . 187 5.4 Analyzing Training/Validation Curves . . . . . . . . . . . . . . . . . . . 188 5.5 ConvNets for Classification of Traffic Signs . . . . . . . . . . . . . . . 189 5.6 Ensemble of ConvNets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 5.6.1 Combining Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 5.6.2 Training Different Models. . . . . . . . . . . . . . . . . . . . . . . 201 5.6.3 Creating Ensemble. . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 5.7 Evaluating Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 5.7.1 Misclassified Images . . . . . . . . . . . . . . . . . . . . . . . . . . 208 5.7.2 Cross-Dataset Analysis and Transfer Learning. . . . . . . . . 209 5.7.3 Stability of ConvNet . . . . . . . . . . . . . . . . . . . . . . . . . . 214 5.7.4 Analyzing by Visualization. . . . . . . . . . . . . . . . . . . . . . 217 5.8 Analyzing by Visualizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 5.8.1 Visualizing Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . 218 5.8.2 Visualizing the Minimum Perception . . . . . . . . . . . . . . . 219 5.8.3 Visualizing Activations. . . . . . . . . . . . . . . . . . . . . . . . . 220 5.9 More Accurate ConvNet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 5.9.1 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 5.9.2 Stability Against Noise. . . . . . . . . . . . . . . . . . . . . . . . . 226 5.9.3 Visualization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
Description: