Table Of ContentUNIVERSITY OF TECHNOLOGY SYDNEY
DOCTORAL THESIS
Feature Fusion, Feature Selection and Local
N-ary Patterns for Object Recognition and
Image Classification
Author: Supervisor:
ShengWANG QiangWU,XiangjianHE
Athesissubmittedinfulfilmentoftherequirements
forthedegreeofDoctorofPhilosophy
inthe
SchoolofComputingandCommunications
FacultyofEngineeringandInformationTechnology
January2015
CERTIFICATE OF ORIGINAL AUTHORSHIP
Icertifythattheworkinthisthesishasnotpreviouslybeensubmittedforadegreenorhasitbeen
submittedaspartofrequirementsforadegreeexceptasfullyacknowledgedwithinthetext.
I also certify that the thesis has been written by me. Any help that I have received in my research
work and the preparation of the thesis itself has been acknowledged. In addition, I certify that all
informationsourcesandliteratureusedareindicatedinthethesis.
SignatureofStudent:
Date:
i
“Concernformanandhisfatemustalwaysformthechiefinterestofalltechnicalendeavors. Never
forgetthisinthemidstofyourdiagramsandequations.”
AlbertEinstein
UNIVERSITYOFTECHNOLOGY,SYDNEY
Abstract
FacultyofEngineeringandInformationTechnology
SchoolofComputingandCommunications
DoctorofPhilosophy
FeatureFusion,FeatureSelectionandLocalN-aryPatternsforObjectRecognitionand
ImageClassification
byShengWANG
Objectrecognitionisoneofthemostfundamentaltopicsincomputervision. Duringpastyears,it
has been the interest for both academies working in computer science and professionals working
in the information technology (IT) industry. The popularity of object recognition has been proven
by its motivation of sophisticated theories in science and wide spread applications in the industry.
Nowadays,withmorepowerfulmachinelearningtools(bothhardwareandsoftware)andthehuge
amountofinformation(data)readilyavailable,higherexpectationsareimposedonobjectrecogni-
tion. Atitsearlystageinthe1990s,thetaskofobjectrecognitioncanbeassimpleastodifferentiate
between object of interest and non-object of interest from a single still image. Currently, the task
ofobjectrecognitionmayaswellincludesthesegmentationandlabelingofdifferentimageregions
(i.e., to assign each segmented image region a meaningful label based on objects appear in those
regions),andthenusingcomputerprogramstoinferthesceneoftheoverallimagebasedonthose
segmented regions. The original two-class classification problem is now getting more complex as
it now evolves toward a multi-class classification problem. In this thesis, contributions on object
recognition are made in two aspects. These are, improvements using feature fusion and improve-
ments using feature selection. Three examples are given in this thesis to illustrate three different
feature fusion methods, the descriptor concatenation (the low-level fusion), the confidence value
escalation (the mid-level fusion) and the coarse-to-fine framework (the high-level fusion). Two
examples are provided for feature selection to demonstrate its ideas, those are, optimal descriptor
selectionandimprovedclassifierselection.
Feature extraction plays a key role in object recognition because it is the first and also the most
important step. If we consider the overall object recognition process, machine learning tools are
toservethepurposeoffindingdistinctivefeaturesfromthevisualdata. Givendistinctivefeatures,
object recognition is readily available (e.g., a simple threshold function can be used to classify
featuredescriptors). TheproposalofLocalN-aryPattern(LNP)texturefeaturescontributestoboth
feature extraction and texture classification. The distinctive LNP feature generalizes the texture
feature extraction process and improves texture classification. Concretely, the local binary pattern
(LBP) is the special case of LNP with n = 2 and the texture spectrum is the special case of LNP
withn = 3. TheproposedLNPrepresentationhasbeenproventooutperformthepopularLBPand
oneoftheLBP’smostsuccessfulextension-localternarypattern(LTP)fortextureclassification.
Acknowledgements
I wish to thank Associate Professor Qiang WU and Professor Xiangjian HE, my principle and co-
supervisors, for their many suggestions and constant encouragement, help and support during my
candidature. IgratefullyacknowledgetheinvaluablediscussionwiththemandIhavethehonourof
studyingandworkingwiththeminthepastfouryearsandninemonthswhichisstampedindelibly
inmylife.
IappreciatethesupportoftheInternationalResearchScholarship(IRS)providedbytheFacultyof
Engineering and Information Technology (FEIT), University of Technology, Sydney (UTS). I ap-
preciatethefinancialsupportformylivingandforattendingtheinternationalconferencesreceived
fromtheFEITandtheUTSVice-Chancellor’sConferenceFund.
Iwishtothankmyfellowcolleaguesandthestaffofthefacultyforprovidingvariousassistancefor
the completion of this research work. In particular, Wenjing Jia, Min Xu for their invaluable help
andsupport.
Lastbutnotleast,Iwouldliketothankmyparentsfortheirunderstandingandsupport. Thisthesis
couldnothavebeencompletedwithouttheirencouragementsandfinancialassistance.
v
Contents
CERTIFICATEOFORIGINALAUTHORSHIP i
Abstract iii
Acknowledgements v
Contents vi
ListofFigures x
ListofTables xiv
Abbreviations xv
1 Introduction 1
1.1 BackgroundInformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 GeneralObjectRecognitionFrameworkwithSupervisedLearningMethods 3
1.1.2 TypicalClassificationMethods . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.2.1 BoostingMethodandAdaBoostClassifier . . . . . . . . . . . . 5
1.1.2.2 NaiveBayesClassifier . . . . . . . . . . . . . . . . . . . . . . . 8
1.1.2.3 SVMClassifier . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1.3 ExampleDatasetsinObjectRecognition . . . . . . . . . . . . . . . . . . 11
1.1.4 EvaluationMethodsandMeasurements . . . . . . . . . . . . . . . . . . . 12
1.1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.2 ListofPublications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2.1 JournalArticles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2.2 ConferenceProceedings . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3 ThesisOrganization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2 SummaryofFeatureExtractionMethods 18
2.1 Over-completeTemplateBasedFeatureDescription . . . . . . . . . . . . . . . . . 19
2.1.1 Haar-likeFeatureforFaceDetection . . . . . . . . . . . . . . . . . . . . . 19
2.1.2 HistogramDistanceofHaarRegionsforObjectRecognition . . . . . . . . 22
2.1.3 EdgeletFeaturesforPedestrianDetection . . . . . . . . . . . . . . . . . . 23
2.2 SparseKey-pointBasedFeatureDescription . . . . . . . . . . . . . . . . . . . . . 26
vi
Contents vii
2.2.1 ImageMatchingUsingScaleInvariantFeatureTransform . . . . . . . . . 26
2.3 DenseSpatialSub-blockBasedFeatureDescription . . . . . . . . . . . . . . . . . 32
2.3.1 DenseSIFTFeatureforImageAlignmentandFaceRecognition . . . . . . 32
2.3.2 HistogramofOrientedGradientFeatureforPedestrianDetection . . . . . 36
2.3.3 DensityVarianceFeatureforLicensePlateDetection . . . . . . . . . . . . 42
2.4 HybridFilterResponseBasedFeatureDescription. . . . . . . . . . . . . . . . . . 43
2.4.1 ObjectBankRepresentationforSceneClassification . . . . . . . . . . . . 43
2.5 KernelBasedFeatureDescription . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.5.1 LocallyAdaptiveRegressionKernelsforCarDetection . . . . . . . . . . . 46
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3 FeatureFusionforObjectRecognition 51
3.1 Concatenation of Histogram Distances Between Intensity Histograms of Different
SpatialSub-blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.1.1 ExploringRelationshipsBetweenDifferentSpatialSub-blocks . . . . . . . 53
3.1.2 CorrelatedHistogramofHaarRegions. . . . . . . . . . . . . . . . . . . . 54
3.1.2.1 OriginalHDHRDistance . . . . . . . . . . . . . . . . . . . . . 55
3.1.2.2 MinkowskiCityBlockL1 ManhattanDistance . . . . . . . . . . 56
3.1.2.3 MinkowskiEuclideanL2 EuclideanDistance. . . . . . . . . . . 56
3.1.3 ExperimentalResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.2 Multi-scaleMid-levelFusionBasedonEscalatingConfidenceValues . . . . . . . 60
3.2.1 Multi-scaleGaussianSmoothing . . . . . . . . . . . . . . . . . . . . . . . 61
3.2.2 AdaptiveCombinationofWeakLearnersBasedonEdgelet . . . . . . . . . 63
3.2.3 ConcatenationofConfidenceValuesAcrossMulti-scaleSpaces . . . . . . 66
3.2.4 ExperimentalResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.3 Coarse-to-fine Fusion Based on Fast Region of Interest Selection and Discrimina-
tiveLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.3.1 FastRegionofInterestSelectioninCoarseStage . . . . . . . . . . . . . . 70
3.3.2 BoostingDistinctiveDSIFTFeaturesinFineStage . . . . . . . . . . . . . 71
3.3.3 ExperimentalResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4 FeatureSelectionforImageClassificationandObjectRecognition 79
4.1 ACompactDescriptorBasedonOptimalMaxPoolingOrientation . . . . . . . . . 80
4.1.1 ASimplifiedStrip-likeSpatialConfiguration . . . . . . . . . . . . . . . . 82
4.1.2 Selecting Optimal Strip-like Partition Scheme Based on SVM Classifica-
tionResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.1.3 ExperimentalResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.2 AnImprovedClassifierwhichRequiresLessTrainingExamples . . . . . . . . . . 95
4.2.1 AnObservationfromUsingInsufficientTrainingExamples . . . . . . . . 96
4.2.2 AnImprovedAdaBoostAlgorithmwithSpatialConstraints . . . . . . . . 97
4.2.3 ExperimentalResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5 LocalN-aryPatternforTextureImageClassification 109
5.1 MoreDistinctiveTextureFeaturesinHighDimensionFeatureSpace . . . . . . . . 110
5.1.1 MappingLocalPatternstoIntegerValues . . . . . . . . . . . . . . . . . . 111
Contents viii
5.1.2 Generalization of Feature Extraction to The Bachet De Meziriac Weight
Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.1.3 DistinctiveLocalPatternsofTheN−ryCodingScheme . . . . . . . . . . 120
5.1.3.1 DistinctivenessofLNPComparedtoLTP . . . . . . . . . . . . 120
5.1.3.2 An Observation of The Statistical Characteristics of The Local
NaryRepresentation . . . . . . . . . . . . . . . . . . . . . . . 121
5.1.3.3 RotationInvariantandUniformPatterns . . . . . . . . . . . . . 125
UniformPatterns . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
RotationInvariantPatterns . . . . . . . . . . . . . . . . . . . . . . 126
5.1.3.4 TextureClassification . . . . . . . . . . . . . . . . . . . . . . . 127
One-against-restSVMClassifier . . . . . . . . . . . . . . . . . . . 127
NearestNeighborClassifier . . . . . . . . . . . . . . . . . . . . . 128
5.1.4 ExperimentalResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6 ConclusionsandFutureWorks 134
6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.1.1 ImprovedObjectRecognitionBasedonFeatureFusion . . . . . . . . . . . 135
6.1.1.1 Concatenating Distances Between Intensity Histograms of Dif-
ferentSpatialSub-blocks . . . . . . . . . . . . . . . . . . . . . 135
6.1.1.2 Multi-scale Mid-level Fusion Based on Escalating Confidence
Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.1.1.3 Coarse-to-fineFusionBasedonFastRegionofInterestSelection
andDistinctiveFeatures . . . . . . . . . . . . . . . . . . . . . . 138
6.1.2 ImprovedObjectRecognitionBasedonFeatureSelection . . . . . . . . . 139
6.1.2.1 ACompactDescriptorBasedonOptimalMaxPoolingOrientation140
6.1.2.2 AClassifierwhichRequiresLessTrainingExamples . . . . . . 141
6.1.3 ImprovedTextureClassificationBasedonLNP . . . . . . . . . . . . . . . 142
6.2 FutureWorks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
A TheGeneralizedBachetDeMeziriacWeightProblem 146
A.1 AGeneralizationtoTheBMWP . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
A.2 ProofofProposition1(ProofbyContradiction) . . . . . . . . . . . . . . . . . . . 147
A.2.1 DetailsofTheContradition . . . . . . . . . . . . . . . . . . . . . . . . . 148
A.3 ProofofProposition2(ProofbyMathematicalInduction) . . . . . . . . . . . . . . 148
A.3.1 Whenn−1IsAnEvenNumber . . . . . . . . . . . . . . . . . . . . . . . 148
A.3.1.1 Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
A.3.1.2 InductiveStep . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
A.3.2 Whenn−1IsAnOddNumber . . . . . . . . . . . . . . . . . . . . . . . 150
A.3.2.1 Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
A.3.2.2 InductiveStep . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
B SomeImplementationsinMatlab 153
B.1 RecursiveSPMMaxPooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
B.2 EdgeletFeatureExtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Contents ix
C ListofTechniqueAbbreviationTerms 157
Bibliography 161
Description:weighted linear sum of T weak hypotheses ht. H = T. ∑ t=1 αt · ht. (1.6). An example of the AdaBoost classifier is illustrated in [3]. In which it is combined with the Haar- like features to achieve efficient frontal face detection. An schematic depiction of the AdaBoost classifier in [3] is fo