Table Of ContentAcademicPressisanimprintofElsevier
30CorporateDrive,Suite400
Burlington,MA01803,USA
TheBoulevard,LangfordLane
Kidlington,Oxford,OX51GB,UK
Copyright©2010ElsevierInc.Allrightsreserved.
Nopartofthispublicationmaybereproducedortransmittedinanyformorbyanymeans,electronicormechanical,including
photocopying,recording,oranyinformationstorageandretrievalsystem,withoutpermissioninwritingfromthepublisher.
Detailsonhowtoseekpermission,furtherinformationaboutthePublisher’spermissionspoliciesandourarrangementswith
organizationssuchastheCopyrightClearanceCenterandtheCopyrightLicensingAgency,canbefoundatourwebsite:
www.elsevier.com/permissions.
ThisbookandtheindividualcontributionscontainedinitareprotectedundercopyrightbythePublisher
(otherthanasmaybenotedherein).
Notices
Knowledgeandbestpracticeinthisfieldareconstantlychanging.Asnewresearchandexperiencebroadenourunderstanding,
changesinresearchmethods,professionalpractices,ormedicaltreatmentmaybecomenecessary.
Practitionersandresearchersmustalwaysrelyontheirownexperienceandknowledgeinevaluatingandusingany
information,methods,compounds,orexperimentsdescribedherein.Inusingsuchinformationormethodstheyshouldbe
mindfuloftheirownsafetyandthesafetyofothers,includingpartiesforwhomtheyhaveaprofessionalresponsibility.
Tothefullestextentofthelaw,neitherthePublishernortheauthors,contributors,oreditors,assumeanyliabilityforany
injuryand/ordamagetopersonsorpropertyasamatterofproductsliability,negligenceorotherwise,orfromanyuseor
operationofanymethods,products,instructions,orideascontainedinthematerialherein.
MATLAB®isatrademarkofTheMathWorks,Inc.,andisusedwithpermission.TheMathWorksdoesnotwarrantthe
accuracyofthetextorexercisesinthisbook.Thisbook’suseordiscussionofMATLAB®softwareorrelatedproductsdoesnot
constituteendorsementorsponsorshipbyTheMathWorksofaparticularpedagogicalapproachorparticularuseofthe
MATLAB®software.
LibraryofCongressCataloging-in-PublicationData
Introductiontopatternrecognition:aMATLAB®approach/SergiosTheodoridis…[etal.].
p.cm.
“ComplimentofthebookPatternrecognition,4thedition,byS.TheodoridisandK.Koutroumbas,
AcademicPress,2009.”
Includesbibliographicalreferencesandindex.
ISBN(alk.paper)
1.Patternrecognitionsystems.2.Patternrecognitionsystems–Mathematics.3.Numericalanalysis.
4.MATLAB. I.Theodoridis,Sergios,1951–.II.Theodoridis,Sergios,Patternrecognition.
TK7882.P3I652010
006.4–dc22
2010004354
BritishLibraryCataloguing-in-PublicationData
AcataloguerecordforthisbookisavailablefromtheBritishLibrary.
ForinformationonallAcademicPresspublications
visitourwebsiteatwww.elsevierdirect.com
PrintedintheUnitedStates
1011121314 10987654321
Preface
The aim of thisbook is toserve pedagogic goals as a complement of the bookPattern Recognition,
4th Edition, by S. Theodoridis and K. Koutroumbas (Academic Press, 2009). It is the offspring of
our experience in teaching pattern recognition for a number of years to different audiences such as
studentswithgoodenough mathematical background, studentswho are more practice-oriented, pro-
fessionalengineers,andcomputerscientistsattendingshorttrainingcourses.Thebookunravelsalong
twodirections.
ThefirstistodevelopasetofMATLAB-basedexamplessothatstudentswillbeabletoexperiment
withmethodsandalgorithmsmetinthevariousstagesofdesigningapatternrecognitionsystem—that
is,classifierdesign,featureselectionandgeneration,andsystemevaluation.Tothisend,wehavemade
aneffortto“design”examplesthatwillhelpthereadergraspthebasicsbehindeachmethodaswellas
therespectiveconsandpros.Inpatternrecognition,therearenomagicrecipesthatdictatewhichmethod
ortechniquetouseforaspecificproblem.Veryoften,oldgoodandsimple(inconcept)techniquescan
compete, from an efficiency pointof view, withmore modern and elaborate techniques. To thisend,
thatis,selectingthemostappropriatetechnique,itisunfortunatethatthesedaysmoreandmorepeople
followtheso-calledblack-boxapproach:trydifferenttechniques,usingarelatedS/Wpackagetoplay
withasetofparameters,eveniftherealmeaningoftheseparametersisnotreallyunderstood.
Such an “unscientific” approach, which really prevents thought and creation, also deprives the
“user”oftheabilitytounderstand,explain,andinterprettheobtainedresults.Forthisreason,mostof
theexamplesinthisbookusesimulateddata.Hence,onecanexperimentwithdifferentparametersand
studythebehavioroftherespectivemethod/algorithm.Havingcontrolofthedata,readerswillbeable
to“study,”“investigate,”andgetfamiliarwiththeprosandconsofatechnique.Onecancreatedatathat
canpushatechniquetoits“limits”—thatis,whereitfails.Inaddition,mostofthereal-lifeproblemsare
solvedinhigh-dimensionalspaces,wherevisualizationisimpossible;yet,visualizinggeometryisone
ofthemostpowerfulandeffectivepedagogictoolssothatanewcomertothefieldwillbeableto“see”
how various methods work. The 3-dimensioanal space is one of the most primitiveand deep-rooted
experiencesinthehumanbrainbecauseeveryoneisacquaintedwithandhasagoodfeelingaboutand
understandingofit.
The second direction is to provide a summary of the related theory, without mathematics. We
have made an effort, as much as possible, to present the theory using arguments based on physical
reasoning, as well as point out the role of the various parameters and how they can influence the
performanceofamethod/algorithm.Nevertheless,foramorethoroughunderstanding,themathematical
formulationcannotbebypassed.Itis“there”wheretherealworthysecretsofamethodare,wherethe
deep understanding has its undisputableroots and grounds, where science lies. Theory and practice
are interrelated—onecannotbe developed withoutthe other.This isthereason thatwe considerthis
bookacomplementofthepreviouslypublishedone.Weconsideritanotherbranchleaningtowardthe
practicalside,theotherbranchbeingthemoretheoreticalone.Bothbranchesarenecessarytoformthe
pattern-recognitiontree,whichhasitsrootsintheworkofhundredsofresearcherswhohaveeffortlessly
contributed,overanumberofdecades,bothintheoryandpractice.
ix
x Preface
All the MATLAB functions used throughoutthis book can be downloaded from the companion
websiteforthisbookatwww.elsevierdirect.com/9780123744869.Notethat,whenrunningtheMATLAB
code inthebook,theresultsmayslightlyvaryamong differentversionsofMATLAB. Moreover, we
havemadeanefforttominimizedependenciesonMATLABtoolboxes,asmuchaspossible,andhave
developedourowncode.
Also, in spite of the careful proofreading of the book, it is still possible that some typos may
have escaped. The authorswouldappreciate readers notifyingthemof any thatare found, as wellas
suggestionsrelatedtotheMATLABcode.
CHAPTER
1
Classifiers Based on Bayes
Decision Theory
1.1 INTRODUCTION
Inthischapter,wediscusstechniquesinspiredbyBayesdecisiontheory.Thetheoreticaldevelopments
of the associated algorithms were given in [Theo09, Chapter 2]. To the newcomer in the field of
pattern recognitionthe chapter’s algorithms and exercises are very important for developing a basic
understandingand familiaritywithsome fundamentalnotionsassociated withclassification. Most of
thealgorithmsaresimpleinbothstructureandphysicalreasoning.
Inaclassificationtask,wearegivenapatternandthetaskistoclassifyitintooneoutofcclasses.
Thenumberofclasses,c,isassumedtobeknownapriori.Eachpatternisrepresentedbyasetoffeature
values,x(i), i=1,2,...,l,whichmakeupthel-dimensionalfeaturevector1x=[x(1),x(2),...,x(l)]T ∈
Rl.Weassumethateachpatternisrepresenteduniquelybyasinglefeaturevectorandthatitcanbelong
toonlyoneclass.
Givenx∈Rl andasetofcclasses,ω, i=1,2,...,c,theBayestheorystatesthat
i
P(ω|x)p(x)=p(x|ω )P(ω) (1.1)
i i i
where
(cid:2)c
p(x)= p(x|ω)P(ω )
i i
i=1
whereP(ω )istheaprioriprobabilityofclassω; i=1,2,...,c,P(ω |x)istheaposterioriprobability
i i i
of class ω given the value of x; p(x) is the probabilitydensityfunction(pdf) of x; and p(x|ω), i=
i i
1=2,...,c, is the class conditional pdf of x given ω (sometimes called the likelihood of ω with
i i
respecttox).
1.2 BAYES DECISION THEORY
We are given a pattern whose class label is unknown and we let x≡[x(1),x(2),...,x(l)]T ∈ Rl be
itscorrespondingfeature vector, whichresultsfromsome measurements. Also, weletthenumber of
possibleclassesbeequaltoc,thatis,ω ,...,ω .
1 c
1Incontrastto[Theo09],vectorquantitiesarenotboldfacedhereincompliancewithMATLABnotation.
Copyright©2010ElsevierInc.Allrightsreserved.
1
DOI:10.1016/B978-0-12-374486-9.00001-4
2 CHAPTER 1 Classifiers Based on Bayes Decision Theory
AccordingtotheBayesdecisiontheory,xisassignedtotheclassω if
i
P(ω |x)>P(ω |x), ∀j(cid:5)=i (1.2)
i j
or,takingintoaccountEq.(1.1)andgiventhatp(x)ispositiveandthesameforallclasses,if
p(x|ω)P(ω )>p(x|ω )P(ω ), ∀j(cid:5)=i (1.3)
i i j j
Remark
• TheBayesianclassifier isoptimalinthesense thatitminimizestheprobabilityoferror[Theo09,
Chapter2].
1.3 THE GAUSSIAN PROBABILITY DENSITY FUNCTION
The Gaussian pdf [Theo09, Section 2.4.1] is extensively used in pattern recognition because of its
mathematical tractabilityaswellasbecause ofthecentrallimittheorem.The latterstatesthatthepdf
ofthesumofanumberofstatisticallyindependentrandomvariablestendstotheGaussianoneasthe
numberofsummandstendstoinfinity.Inpractice,thisisapproximatelytrueforalargeenoughnumber
ofsummands.
ThemultidimensionalGaussianpdfhastheform
(cid:3) (cid:4)
1 1
p(x)= exp − (x−m)TS−1(x−m) (1.4)
(2π)l/2|S|1/2 2
wherem=E[x]isthemeanvector,SisthecovariancematrixdefinedasS=E[(x−m)(x−m)T],|S|
isthedeterminantofS.
Often we refer to the Gaussian pdf as the normal pdf and we use the notationN(m,S). For the
1-dimensionalcase, x∈R,theabovebecomes
(cid:3) (cid:4)
1 (x−m)2
p(x)= √ exp − (1.5)
2πσ 2σ2
whereσ2 isthevarianceoftherandomvariablex.
Example 1.3.1. Compute the value of a Gaussian pdf, N(m,S), at x =[0.2,1.3]T and x =
1 2
[2.2,−1.3]T,where
(cid:5) (cid:6)
1 0
m=[0, 1]T, S=
0 1
1.3 The Gaussian Probability Density Function 3
Solution. Usethefunctioncomp_gauss_dens_valtocomputethevalueoftheGaussianpdf.Specifi-
cally,type
m=[0 1]'; S=eye(2);
x1=[0.2 1.3]'; x2=[2.2 -1.3]';
pg1=comp_gauss_dens_val(m,S,x1);
pg2=comp_gauss_dens_val(m,S,x2);
Theresultingvaluesforpg1andpg2are0.1491and0.001,respectively.
Example1.3.2. Considera 2-class classification task inthe 2-dimensionalspace, where the data in
bothclasses,ω , ω ,aredistributedaccordingtotheGaussiandistributionsN(m ,S )andN(m ,S ),
1 2 1 1 2 2
respectively.Let
(cid:5) (cid:6)
1 0
m =[1, 1]T, m =[3, 3]T, S =S =
1 2 1 2 0 1
AssumingthatP(ω )=P(ω )=1/2,classifyx=[1.8, 1.8]T intoω orω .
1 2 1 2
Solution. Utilizethefunctioncomp_gauss_dens_valbytyping
P1=0.5;
P2=0.5;
m1=[1 1]'; m2=[3 3]'; S=eye(2); x=[1.8 1.8]';
p1=P1*comp_gauss_dens_val(m1,S,x);
p2=P2*comp_gauss_dens_val(m2,S,x);
Theresultingvaluesforp1andp2are0.042and0.0189,respectively,andxisclassifiedtoω according
1
totheBayesianclassifier.
Exercise1.3.1
RepeatExample1.3.2forP(ω )=1/6andP(ω )=5/6,andforP(ω )=5/6andP(ω )=1/6.Observethe
1 2 1 2
dependanceoftheclassificationresultontheaprioriprobabilities[Theo09,Section2.4.2].
Example1.3.3. Generate N =500 2-dimensional data points that are distribute(cid:5)d accordin(cid:6)g to the
σ2 σ
GaussiandistributionN(m,S),withmeanm=[0, 0]T andcovariancematrixS= 1 12 ,forthe
σ σ2
12 2
followingcases:
σ2=σ2=1,σ =0
1 2 12
σ2=σ2=0.2,σ =0
1 2 12
σ2=σ2=2,σ =0
1 2 12
4 CHAPTER 1 Classifiers Based on Bayes Decision Theory
σ2=0.2,σ2=2,σ =0
1 2 12
σ2=2,σ2=0.2,σ =0
1 2 12
σ2=σ2=1,σ =0.5
1 2 12
σ2=0.3,σ2=2,σ =0.5
1 2 12
σ2=0.3,σ2=2,σ =−0.5
1 2 12
Ploteachdatasetandcommentontheshapeoftheclustersformedbythedatapoints.
Solution. Togeneratethefirstdataset,usethebuilt-inMATLABfunctionmvnrndbytyping
randn('seed',0) %Initialization of the randn function
m=[0 0]';
S=[1 0;0 1];
N=500;
X = mvnrnd(m,S,N)';
whereX isthematrixthatcontainsthedatavectorsinitscolumns.
To ensure reproducibilityof the results, the randn MATLAB function, which generates random
numbers following the Gaussian distribution, with zero mean and unit variance, is initialized to a
specificnumberviathefirstcommand(inthepreviouscoderandniscalledbythemvnrnd MATLAB
function).
Toplotthedataset,type
figure(1), plot(X(1,:),X(2,:),'.');
figure(1), axis equal
figure(1), axis([-7 7 -7 7])
Workingsimilarlyfortheseconddataset,type
m=[0 0]';
S=[0.2 0;0 0.2];
N=500;
X = mvnrnd(m,S,N)';
figure(2), plot(X(1,:),X(2,:),'.');
figure(2), axis equal
figure(2), axis([-7 7 -7 7])
Therestofthedatasetsareobtainedsimilarly.AllofthemaredepictedinFigure1.1,fromwhichone
canobservethefollowing:
• When the two coordinates of x are uncorrelated (σ =0) and theirvariances are equal, the data
12
vectorsform“sphericallyshaped”clusters(Figure1.1(a–c)).
• Whenthetwocoordinatesofx areuncorrelated(σ =0)andtheirvariancesareunequal,thedata
12
vectorsform“ellipsoidallyshaped”clusters.Thecoordinatewiththehighestvariancecorresponds
tothe“majoraxis”oftheellipsoidallyshapedcluster,whilethecoordinatewiththelowestvariance
correspondstoits“minoraxis.”Inaddition,themajorandminoraxes oftheclusterareparallelto
theaxes(Figure1.1(d,e)).
1.3 The Gaussian Probability Density Function 5
6 6 6
4 4 4
2 2 2
0 0 0
22 22 22
24 24 24
26 26 26
26 24 22 0 2 4 6 26 24 22 0 2 4 6 26 24 22 0 2 4 6
(a) (b) (c)
6 6
4 4
2 2
0 0
22 22
24 24
26 26
26 24 22 0 2 4 6 26 24 22 0 2 4 6
(d) (e)
6 6 6
4 4 4
2 2 2
0 0 0
22 22 22
24 24 24
26 26 26
26 24 22 0 2 4 6 26 24 22 0 2 4 6 26 24 22 0 2 4 6
(f) (g) (h)
FIGURE1.1
EightdatasetsofExample1.3.3.
• Whenthetwocoordinatesofxarecorrelated(σ (cid:5)=0),themajorandminoraxesoftheellipsoidally
12
shaped cluster are no longer parallel to the axes. The degree of rotationwith respect to the axes
depends on thevalue ofσ (Figure1.1(f–h)).The effect ofthe value ofσ , whether positiveor
12 12
negative,isdemonstratedinFigure1.1(g,h).Finally,ascanbeseenbycomparingFigure1.1(a,f),
whenσ (cid:5)=0,thedataformellipsoidallyshapedclustersdespitethefactthatthevariancesofeach
12
coordinatearethesame.
6 CHAPTER 1 Classifiers Based on Bayes Decision Theory
1.4 MINIMUM DISTANCE CLASSIFIERS
1.4.1 The Euclidean Distance Classifier
TheoptimalBayesianclassifierissignificantlysimplifiedunderthefollowingassumptions:
• Theclassesareequiprobable.
• ThedatainallclassesfollowGaussiandistributions.
• Thecovariancematrixisthesameforallclasses.
• Thecovariancematrixisdiagonalandallelementsacrossthediagonalareequal.Thatis,S =σ2I,
whereI istheidentitymatrix.
Undertheseassumptions,itturnsoutthattheoptimalBayesianclassifierisequivalenttotheminimum
Euclideandistanceclassifier.Thatis,givenanunknownx,assignittoclassω if
i
(cid:7)
||x−m||≡ (x−m)T(x−m)<||x−m||, ∀i(cid:5)=j
i i i j
It must be stated that the Euclidean classifier is often used, even if we know that the previously
statedassumptionsarenotvalid,becauseofitssimplicity.Itassignsapatterntotheclasswhosemean
isclosesttoitwithrespecttotheEuclideannorm.
1.4.2 The Mahalanobis Distance Classifier
If onerelaxes theassumptionsrequiredbytheEuclidean classifier andremoves thelast one, theone
requiringthecovariancematrixtobediagonalandwithequalelements,theoptimalBayesianclassifier
becomesequivalenttotheminimumMahalanobisdistanceclassifier.Thatis,givenanunknownx,itis
assignedtoclassω if
i
(cid:7) (cid:8)
(x−m)TS−1(x−m)< (x−m)TS−1(x−m), ∀j(cid:5)=i
i i j j
whereSisthecommoncovariancematrix.Thepresenceofthecovariancematrixaccountsfortheshape
oftheGaussians[Theo09,Section2.4.2].
Example 1.4.1. Consider a 2-class classification task in the 3-dimensional space, where the
two classes, ω and ω , are modeled by Gaussian distributions with means m =[0,0,0]T and
1 2 1
m =[0.5, 0.5, 0.5]T,respectively.Assumethetwoclassestobeequiprobable.Thecovariancematrix
2
forbothdistributionsis
⎡ ⎤
0.8 0.01 0.01
⎢ ⎥
S=⎣ 0.01 0.2 0.01 ⎦
0.01 0.01 0.2
Giventhepointx=[0.1, 0.5, 0.1]T,classifyx (1)accordingtotheEuclideandistanceclassifierand
(2)accordingtotheMahalanobisdistanceclassifier.Commentontheresults.
Description:Introduction to pattern recognition : a MATLAB® approach / Sergios Theodoridis … [et al.]. p. cm. “Compliment of the book Pattern recognition, 4th