Table Of ContentProgramming Computer Vision
with Python
>
m
o
c
k.
o
o
b
e
w
o
w
w.
w
w
<
k
o
o
B
e
w!
o
W
m
o
d fr
a
o
nl
w
o
D
Jan Erik Solem
Beijing•Cambridge•Farnham•Ko¨ln•Sebastopol•Tokyo
ProgrammingComputerVisionwithPython
byJanErikSolem
Copyright©2012JanErikSolem.Allrightsreserved.
PrintedintheUnitedStatesofAmerica
PublishedbyO’ReillyMedia,Inc.,1005GravensteinHighwayNorth,Sebastopol,CA95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online
editionsarealsoavailableformosttitles(http://my.safaribooksonline.com).Formoreinformation,
contactourcorporate/institutionalsalesdepartment:(800)998-9938orcorporate@oreilly.com.
Interiordesigner: DavidFutato Projectmanager: PaulC.Anagnostopoulos
Coverdesigner: KarenMontgomery Copyeditor: PriscillaStevens
Editors: AndyOram,MikeHendrickson Proofreader: RichardCamp
Productioneditor: HollyBauer Illustrator: LaurelMuller
June2012 Firstedition
RevisionHistoryfortheFirstEdition:
2012-06-11 Firstrelease
Seehttp://oreilly.com/catalog/errata.csp?isbn=0636920022923forreleasedetails.
NutshellHandbook,theNutshellHandbooklogo,andtheO’Reillylogoareregisteredtrademarks
ofO’ReillyMedia,Inc.ProgrammingComputerVisionwithPython,theimageofabullheadfish,
andrelatedtradedressaretrademarksofO’ReillyMedia,Inc.
Manyofthedesignationsusedbymanufacturersandsellerstodistinguishtheirproductsare
claimedastrademarks.Wherethosedesignationsappearinthisbook,andO’ReillyMedia,Inc.,
wasawareofatrademarkclaim,thedesignationshavebeenprintedincapsorinitialcaps.
Whileeveryprecautionhasbeentakeninthepreparationofthisbook,thepublisherandauthors
assumenoresponsibilityforerrorsoromissions, orfordamagesresultingfromtheuseofthe
informationcontainedherein.
ISBN:978-1-449-31654-9
[M]
Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
1. BasicImageHandlingandProcessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 PIL—ThePythonImagingLibrary 1
1.2 Matplotlib 3
1.3 NumPy 7
1.4 SciPy 16
1.5 AdvancedExample:ImageDe-Noising 23
Exercises 26
ConventionsfortheCodeExamples 27
2. LocalImageDescriptors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.1 HarrisCornerDetector 29
2.2 SIFT—Scale-InvariantFeatureTransform 36
2.3 MatchingGeotaggedImages 44
Exercises 51
3. ImagetoImageMappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.1 Homographies 53
3.2 WarpingImages 57
3.3 CreatingPanoramas 70
Exercises 77
4. CameraModelsandAugmentedReality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.1 ThePin-HoleCameraModel 79
4.2 CameraCalibration 84
4.3 PoseEstimationfromPlanesandMarkers 86
4.4 AugmentedReality 89
Exercises 98
iii
5. MultipleViewGeometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.1 EpipolarGeometry 99
5.2 ComputingwithCamerasand3DStructure 107
5.3 MultipleViewReconstruction 113
5.4 StereoImages 120
Exercises 125
6. ClusteringImages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.1 K-MeansClustering 127
6.2 HierarchicalClustering 133
6.3 SpectralClustering 140
Exercises 145
7. SearchingImages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
7.1 Content-BasedImageRetrieval 147
7.2 VisualWords 148
7.3 IndexingImages 151
7.4 SearchingtheDatabaseforImages 155
7.5 RankingResultsUsingGeometry 160
7.6 BuildingDemosandWebApplications 162
Exercises 165
8. ClassifyingImageContent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
8.1 K-NearestNeighbors 167
8.2 BayesClassifier 175
8.3 SupportVectorMachines 179
8.4 OpticalCharacterRecognition 183
Exercises 189
9. ImageSegmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
9.1 GraphCuts 191
9.2 SegmentationUsingClustering 200
9.3 VariationalMethods 204
Exercises 206
10. OpenCV. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
10.1 TheOpenCVPythonInterface 209
10.2 OpenCVBasics 210
10.3 ProcessingVideo 213
10.4 Tracking 216
10.5 MoreExamples 223
Exercises 226
iv | TableofContents
A. InstallingPackages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
A.1 NumPyandSciPy 227
A.2 Matplotlib 228
A.3 PIL 228
A.4 LibSVM 228
A.5 OpenCV 229
A.6 VLFeat 230
A.7 PyGame 230
A.8 PyOpenGL 230
A.9 Pydot 230
A.10 Python-graph 231
A.11 Simplejson 231
A.12 PySQLite 232
A.13 CherryPy 232
B. ImageDatasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
B.1 Flickr 233
B.2 Panoramio 234
B.3 OxfordVisualGeometryGroup 235
B.4 UniversityofKentuckyRecognitionBenchmarkImages 235
B.5 Other 235
C. ImageCredits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
C.1 ImagesfromFlickr 237
C.2 OtherImages 238
C.3 Illustrations 238
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
TableofContents | v
>
m
o
c
k.
o
o
b
e
w
o
w
w.
w
w
<
k
o
o
B
e
w!
o
W
m
o
d fr
a
o
nl
w
o
D
Preface
Today, images and video are everywhere. Online photo-sharing sites and social net-
workshavetheminthebillions.Searchengineswillproduceimagesofjustaboutany
conceivable query. Practically all phones and computers come with built-in cameras.
Itisnotuncommonforpeopletohavemanygigabytesofphotosandvideosontheir
devices.
Programmingacomputeranddesigningalgorithmsforunderstandingwhatisinthese
imagesisthefieldofcomputervision.Computervisionpowersapplicationslikeimage
search,robotnavigation,medicalimageanalysis,photomanagement,andmanymore.
The idea behind this book is to give an easily accessible entry point to hands-on
computervisionwithenoughunderstandingoftheunderlyingtheoryandalgorithms
tobeafoundationforstudents,researchers,andenthusiasts.ThePythonprogramming
language,thelanguagechoiceofthisbook,comeswithmanyfreelyavailable,powerful
modulesforhandlingimages,mathematicalcomputing,anddatamining.
Whenwritingthisbook,Ihaveusedthefollowingprinciplesasaguideline.Thebook
should:
Bewritteninanexploratorystyleandencouragereaderstofollowtheexampleson
.
theircomputersastheyarereadingthetext.
Promoteandusefreeandopensoftwarewithalowlearningthreshold.Pythonwas
.
theobviouschoice.
Becompleteandself-contained.Thisbookdoesnotcoverallofcomputervision
.
butratheritshouldbecompleteinthatallcodeispresentedandexplained.The
readershouldbeabletoreproducetheexamplesandbuilduponthemdirectly.
Bebroadratherthandetailed,inspiringandmotivationalratherthantheoretical.
.
Inshort,itshouldactasasourceofinspirationforthoseinterestedinprogramming
computervisionapplications.
vii
PrerequisitesandOverview
Thisbooklooksattheoryandalgorithmsforawiderangeofapplicationsandproblems.
Hereisashortsummaryofwhattoexpect.
WhatYouNeedtoKnow
Basicprogrammingexperience.Youneedtoknowhowtouseaneditorandrun
.
scripts,howtostructurecodeaswellasbasicdatatypes.FamiliaritywithPython
orotherscriptinglanguageslikeRubyorMatlabwillhelp.
Basicmathematics.Tomakefulluseoftheexamples,ithelpsifyouknowabout
.
matrices,vectors,matrixmultiplication,andstandardmathematicalfunctionsand
conceptslikederivativesandgradients.Someofthemoreadvancedmathematical
examplescanbeeasilyskipped.
WhatYouWillLearn
Hands-onprogrammingwithimagesusingPython.
.
Computervisiontechniquesbehindawidevarietyofreal-worldapplications.
.
Many of the fundamental algorithms and how to implement and apply them
.
yourself.
The code examples in this book will show you object recognition, content-based
imageretrieval,imagesearch,opticalcharacterrecognition,opticalflow,tracking,3D
reconstruction,stereoimaging,augmentedreality,poseestimation,panoramacreation,
imagesegmentation,de-noising,imagegrouping,andmore.
ChapterOverview
Chapter1,“BasicImageHandlingandProcessing”
IntroducesthebasictoolsforworkingwithimagesandthecentralPythonmodules
usedinthebook.Thischapteralsocoversmanyfundamentalexamplesneededfor
theremainingchapters.
Chapter2,“LocalImageDescriptors”
Explainsmethodsfordetectinginterestpointsinimagesandhowtousethemto
findcorrespondingpointsandregionsbetweenimages.
Chapter3,“ImagetoImageMappings”
Describesbasictransformationsbetweenimagesandmethodsforcomputingthem.
Examplesrangefromimagewarpingtocreatingpanoramas.
Chapter4,“CameraModelsandAugmentedReality”
Introduces how to model cameras, generate image projections from 3D space to
imagefeatures,andestimatethecameraviewpoint.
Chapter5,“MultipleViewGeometry”
Explainshowtoworkwithseveralimagesofthesamescene,thefundamentalsof
multiple-viewgeometry,andhowtocompute3Dreconstructionsfromimages.
viii | Preface
Description:If you want a basic understanding of computer vision’s underlying theory and algorithms, this hands-on introduction is the ideal place to start. You’ll learn techniques for object recognition, 3D reconstruction, stereo imaging, augmented reality, and other computer vision applications as you fol