Table Of ContentScott Krig
Computer
Vision Metrics
Textbook Edition
Survey, Taxonomy and Analysis of Computer
Vision, Visual Neuroscience, and Deep Learning
Computer Vision Metrics
Scott Krig
Computer Vision Metrics
Textbook Edition
Survey, Taxonomy and Analysis
of Computer Vision, Visual
Neuroscience, and Deep Learning
ScottKrig
KrigResearch,USA
ISBN978-3-319-33761-6 ISBN978-3-319-33762-3(eBook)
DOI10.1007/978-3-319-33762-3
LibraryofCongressControlNumber:2016938637
#SpringerInternationalPublishingSwitzerland2016
ThisSpringerimprintispublishedbySpringerNatureThisworkissubjecttocopyright.Allrights
arereservedbythePublisher,whetherthewholeorpartofthematerialisconcerned,specifically
therightsoftranslation,reprinting,reuseofillustrations,recitation,broadcasting,reproduction
on microfilms or in any other physical way, and transmission or information storage and
retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
nowknownorhereafterdeveloped.
Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthis
publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesare
exemptfromtherelevantprotectivelawsandregulationsandthereforefreeforgeneraluse.
Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationin
thisbookarebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernor
the authors or the editors give a warranty, express or implied, with respect to the material
containedhereinorforanyerrorsoromissionsthatmayhavebeenmade.
Printedonacid-freepaper
ThisSpringerimprintispublishedbySpringerNature
TheregisteredcompanyisSpringerInternationalPublishingAGSwitzerland
Foreword to the Second Edition
The goal of this second version is to add new materials on deep learning,
neuroscience applied to computer vision, historical developments in neural
networks, and feature learning architectures, particularly neural network
methods. In addition, this second edition cleans up some typos and other
itemsfromthefirstversion.Intotal,threenewchaptersareaddedtosurvey
the latest feature learning, and hierarchical deep learning methods and
architectures.Overall,thisbookisprovidesawidesurveyofcomputervision
methodsincludinglocalfeaturedescriptors,regionalandglobalfeatures,and
feature learning methods, with a taxonomy for organizational purposes.
Analysis is distributed through the book to provide intuition behind the
various approaches, encouraging the reader to think for themselves about
the motivations for each approach, why different methods are created, how
each method is designed and architected, and why it works. Nearly 1000
references to the literature and other materials are provided, making com-
putervisionandimagingresourcesaccessibleatmanylevels.
Myexpectationforthereaderisthis:ifyouwanttolearnabout90 %of
computervision,readthisbook.Tolearntheother10 %,readthereferences
provided and spend at least 20 years creating real systems. Reading this
bookwilltakeamatterofhours,andreadingthereferencesandcreatingreal
systemswilltakealifetimetoonlyscratchthesurface.Wefollowtheaxiom
oftheeminentDr.JackSparrow,whohasnotimeforextraneousdetails,and
here we endeavor to present computer vision materials in a fashion that
makes the fundamentals accessible to many outside the inner circles of
academia:
“Ilikeit.Simple,easytoremember”.
JackSparrow,PiratesoftheCaribbean
This book is suitable for independent study, reference, or coursework at
theuniversitylevelandbeyondforexperiencedengineersandscientists.The
chapters are divided in such a way that various courses can be devised to
incorporate a subset of chapters to accommodate course requirements. For
example, typical course titles include “Image Sensors and Image
Processing,”“ComputerVisionAndImageProcessing,”“AppliedComputer
Vision And Imaging Optimizations,” “Feature Learning, Deep Learning,
and Neural Network Architectures,” “Computer Vision Architectures,”
“Computer Vision Survey.” Questions are available for coursework at the
end of each chapter. It is recommended that this book be used as a
v
vi ForewordtotheSecondEdition
complement to other fine books, open source code, and hands-on materials
for study in computer vision and related scientific disciplines, or possibly
usedbyitselfforahigher-levelsurveycourse.
This book may be used as required reading to provide a survey
component to academic coursework for science and engineering
disciplines, to complement other texts that contain hands-on and
how-tomaterials.
This book DOES NOT PROVIDE extensive how-to coding examples,
worked out examples, mathematical proofs, experimental results and
comparisons, or detailed performance data, which are already very well
covered in the bibliography references. The goal is to provide an analysis
across a representative survey of methods, rather than repeating what is
already found in the references. This is not a workbook with open source
code (only a little source code is provided), since there are many fine open
source materials available already, which are referenced for the interested
reader.
Instead,thisbookDOESPROVIDEanextensivesurvey,taxonomy,and
analysisofcomputervisionmethods.Thegoalistofindtheintuitionbehind
the methods surveyed. The book is meant to be read, rather than worked
through. Thisis nota workbook, butis intended to provide sufficientback-
groundforthereadertofindpathwaysforwardintobasicorappliedresearch
foravarietyofscientificandengineeringapplications.Insomerespects,this
work is a museum of computer vision, containing concepts, observations,
oddments,andrelicswhichfascinateme.
The book is designed to complement existing texts and fill a niche in the
literature.Thebook takes acomplete path through computer vision, begin-
ning with image sensors, image processing, global-regional-local feature
descriptor methods, feature learningand deep learning, neuralnetworks for
computer vision, ground truth data and training, applied engineering
optimizations across CPU, GPU, and software optimization methods. The
author could not find a similar book, otherwise I would not have begun
thiswork.
This book aims at a survey, taxonomy, and analysis of computer vision
methods from the perspective of the features used—the feature descriptors
themselves, how they are designed and how they are organized. Learning
methods and architectures are necessary and supporting factors, and are
included here for completeness. However, I am personally fascinated by
the feature descriptor methods themselves, and I regard them as an
art-form for mathematically arranging pixelpatterns,shapes,andspectra to
revealhowimagesarecreated.Iregardeachfeaturedescriptorasaworkof
art,likeapaintingormathematicalsculpturepreparedbyanartist,andthus
the perspective of this work is to survey feature descriptor and feature
learningmethodsandappreciateeachone.
Asshowninthisbookoverandoveragain,researchersarefindingthata
wide range of feature descriptors are effective, and that one of the keys to
best results seems to be the sheer number of features used in feature
hierarchies, rather than the choice of SIFT vs. pixel patches vs. CNN
features. In the surveys herein we see that many methods for learning and
ForewordtotheSecondEdition vii
trainingareused,manyarchitecturesareused,andtheconsensusseemstobe
that hierarchical feature learning is now the mainstay of computer vision,
followingonfromthepioneeringworkinconvolutionalneuralnetworksand
deep learning methods applied to computer vision, which has accelerated
since the new millennium. The older computer vision methods are being
combinedwiththenewerones,andnowapplicationsarebeginningtoappear
inconsumerdevices,ratherthanexoticmilitaryandintelligencesystemsof
thepast.
Special thanks to Courtney Clarke at Springer for commissioning this
second version, and providing support and guidance to make the second
versionbetter.
Special thanks to all the wonderful feedback on the first version, which
helped to shape this second version. Vin Ratford and Jeff Bier of the
Embedded Vision Alliance (EVA) arranged to provide copies of the first
version to all EVA members, both hardcopy and e-book versions, and
maintained a feedback web page for review comments—much appreciated.
Thanks to Mike Schmidt and Vadim Pisarevsky for excellent review
comments over of the entire book. Juergen Schmidhuber provided links to
historical information on neural networks and other useful information,
Kunihiko Fukushima provided copies of some of his early neural network
research papers, Rahul Suthankar provided updates on key trends in com-
puter vision, and Hugo LaRochelle provided information and references on
CNNtopics andPatrickCoxonHMAXtopics.Interestinginformation was
alsoprovidedbyRobertGens,AndrejKarpathy.AndIwouldliketore-thank
those who contributed to the first version, including Paul Rosin regarding
syntheticinterestpoints,YannLeCunforprovidingkeyreferencesintodeep
learning and convolutional networks, Shree Nayar for permission to use a
fewimages,LucianoOviedoforblueskydiscussions,andmanyotherswho
haveinfluencedmythinkingincludingAlexandreAlahi,SteveSeitz,Bryan
Russel, Liefeng Bo, Xiaofeng Ren, Gutemberg Guerra-filho, Harsha
Viswana, Dale Hitt, Joshua Gleason, Noah Snavely, Daniel Scharstein,
Thomas Salmon, Richard Baraniuk, Carl Vodrick, Herve´ Je´gou, Andrew
Richardson, Ofri Weschler, Hong Jiang, Andy Kuzma, Michael Jeronimo,
EliTuriel,andmanyotherswhomIhavefailedtomention.
As usual, thanks to my wife for patience with my research, and also for
providing the “governor” switch to pace my work, without which I would
likelyburnoutmorecompletely.Andmostofall,specialthankstothegreat
inventorwhoinspiresusall,AnnoDomini2016.
ScottKrig
Contents
1 ImageCaptureandRepresentation. . . . . . . . . . . . . . . . . . . 1
ImageSensorTechnology. . .. . . . . .. . . . .. . . . .. . . . .. . . . 1
SensorMaterials. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
SensorPhotodiodeCells. . . . . . . . . . . . . . . . . . . . . . . . . . . 2
SensorConfigurations:Mosaic,Foveon,BSI. . . . . . . . . . . . 4
DynamicRange,Noise,SuperResolution. . . . . . . . . . . . . . 5
SensorProcessing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
De-Mosaicking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
DeadPixelCorrection.. . . . . .. . . . . .. . . . . .. . . . . .. . . . 6
ColorandLightingCorrections. . . . . . . . . . . . . . . . . . . . . . 6
GeometricCorrections. . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
CamerasandComputationalImaging. . . . . . . . . . . . . . . . . . . 7
OverviewofComputationalImaging. . . . . . . . . . . . . . . . . . 7
Single-PixelComputationalCameras. . . . . .. . . . . . . . . . . . 8
2DComputationalCameras. . . . . . . . . . . . . . . . . . . . . . . . 9
3DDepthCameraSystems. . . . . . . . . . . . . . . . . . . . . . . . . 10
3DDepthProcessing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
OverviewofMethods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
ProblemsinDepthSensingandProcessing. . . . . . . . . . . . . 22
MonocularDepthProcessing. . . . . . . . . . . . . . . . . . . . . . . . 27
3DRepresentations:Voxels,DepthMaps,Meshes,
andPointClouds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Chapter1:LearningAssignments. . . . . . . . . . . . . . . . . . . . . . 33
2 ImagePre-Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
PerspectivesonImageProcessing. . . . . . . . . . . . . . . . . . . . . . 35
ProblemstoSolveDuringImagePreprocessing. . . . . . . . . . . . 36
VisionPipelinesandImagePreprocessing. . . . . . . . . . . . . . 36
Corrections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Enhancements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
PreparingImagesforFeatureExtraction. . . . . . . . . . . . . . . 39
TheTaxonomyofImageProcessingMethods. . . . . . . . . . . . . 43
Point. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Line. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Area. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Algorithmic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
ix
Description:Based on the successful 2014 book published by Apress, this textbook edition is expanded to provide a comprehensive history and state-of-the-art survey for fundamental computer vision methods. With over 800 essential references, as well as chapter-by-chapter learning assignments, both students and r