ebook img

Statistical pattern recognition PDF

504 Pages·2002·2.53 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Statistical pattern recognition

StatisticalPatternRecognition,SecondEdition.AndrewR.Webb Copyright2002JohnWiley&Sons,Ltd. ISBNs:0-470-84513-9(HB);0-470-84514-7(PB) Statistical Pattern Recognition Statistical Pattern Recognition Second Edition Andrew R. Webb QinetiQ Ltd., Malvern, UK FirsteditionpublishedbyButterworthHeinemann. Copyright(cid:1)c 2002 JohnWiley&SonsLtd,TheAtrium,SouthernGate,Chichester, WestSussexPO198SQ,England Telephone(+44)1243779777 Email(forordersandcustomerserviceenquiries):[email protected] VisitourHomePageonwww.wileyeurope.comorwww.wiley.com AllRightsReserved.Nopartofthispublicationmaybereproduced,storedinaretrievalsystemor transmittedinanyformorbyanymeans,electronic,mechanical,photocopying,recording,scanningor otherwise,exceptunderthetermsoftheCopyright,DesignsandPatentsAct1988orunderthetermsof alicenceissuedbytheCopyrightLicensingAgencyLtd,90TottenhamCourtRoad,LondonW1T4LP, UK,withoutthepermissioninwritingofthePublisher.RequeststothePublishershouldbeaddressed tothePermissionsDepartment,JohnWiley&SonsLtd,TheAtrium,SouthernGate,Chichester,West SussexPO198SQ,England,[email protected],orfaxedto(+44)1243770571. Thispublicationisdesignedtoprovideaccurateandauthoritativeinformationinregardtothesubject mattercovered.ItissoldontheunderstandingthatthePublisherisnotengagedinrendering professionalservices.Ifprofessionaladviceorotherexpertassistanceisrequired,theservicesofa competentprofessionalshouldbesought. OtherWileyEditorialOffices JohnWiley&SonsInc.,111RiverStreet,Hoboken,NJ07030,USA Jossey-Bass,989MarketStreet,SanFrancisco,CA94103-1741,USA Wiley-VCHVerlagGmbH,Boschstr.12,D-69469Weinheim,Germany JohnWiley&SonsAustraliaLtd,33ParkRoad,Milton,Queensland4064,Australia JohnWiley&Sons(Asia)PteLtd,2ClementiLoop#02-01,JinXingDistripark,Singapore129809 JohnWiley&SonsCanadaLtd,22WorcesterRoad,Etobicoke,Ontario,CanadaM9W1L1 BritishLibraryCataloguinginPublicationData AcataloguerecordforthisbookisavailablefromtheBritishLibrary ISBN0-470-84513-9(Cloth) ISBN0-470-84514-7(Paper) TypesetfromLaTeXfilesproducedbytheauthorbyLaserwordsPrivateLimited,Chennai,India PrintedandboundinGreatBritainbyBiddlesLtd,Guildford,Surrey Thisbookisprintedonacid-freepaperresponsiblymanufacturedfromsustainableforestryinwhichat leasttwotreesareplantedforeachoneusedforpaperproduction. To Rosemary, Samuel, Miriam, Jacob and Ethan Contents Preface xv Notation xvii 1 Introduction to statistical pattern recognition 1 1.1 Statistical pattern recognition 1 1.1.1 Introduction 1 1.1.2 The basic model 2 1.2 Stages in a pattern recognition problem 3 1.3 Issues 4 1.4 Supervised versus unsupervised 5 1.5 Approaches to statistical pattern recognition 6 1.5.1 Elementary decision theory 6 1.5.2 Discriminant functions 19 1.6 Multiple regression 25 1.7 Outline of book 27 1.8 Notes and references 28 Exercises 30 2 Density estimation – parametric 33 2.1 Introduction 33 2.2 Normal-based models 34 2.2.1 Linear and quadratic discriminant functions 34 2.2.2 Regularised discriminant analysis 37 2.2.3 Example application study 38 2.2.4 Further developments 40 2.2.5 Summary 40 2.3 Normal mixture models 41 2.3.1 Maximum likelihood estimation via EM 41 2.3.2 Mixture models for discrimination 45 2.3.3 How many components? 46 2.3.4 Example application study 47 2.3.5 Further developments 49 2.3.6 Summary 49 viii CONTENTS 2.4 Bayesian estimates 50 2.4.1 Bayesian learning methods 50 2.4.2 Markov chain Monte Carlo 55 2.4.3 Bayesian approaches to discrimination 70 2.4.4 Example application study 72 2.4.5 Further developments 75 2.4.6 Summary 75 2.5 Application studies 75 2.6 Summary and discussion 77 2.7 Recommendations 77 2.8 Notes and references 77 Exercises 78 3 Density estimation – nonparametric 81 3.1 Introduction 81 3.2 Histogram method 82 3.2.1 Data-adaptive histograms 83 3.2.2 Independence assumption 84 3.2.3 Lancaster models 85 3.2.4 Maximum weight dependence trees 85 3.2.5 Bayesian networks 88 3.2.6 Example application study 91 3.2.7 Further developments 91 3.2.8 Summary 92 3.3 k-nearest-neighbour method 93 3.3.1 k-nearest-neighbour decision rule 93 3.3.2 Properties of the nearest-neighbour rule 95 3.3.3 Algorithms 95 3.3.4 Editing techniques 98 3.3.5 Choice of distance metric 101 3.3.6 Example application study 102 3.3.7 Further developments 103 3.3.8 Summary 104 3.4 Expansion by basis functions 105 3.5 Kernel methods 106 3.5.1 Choice of smoothing parameter 111 3.5.2 Choice of kernel 113 3.5.3 Example application study 114 3.5.4 Further developments 115 3.5.5 Summary 115 3.6 Application studies 116 3.7 Summary and discussion 119 3.8 Recommendations 120 3.9 Notes and references 120 Exercises 121 CONTENTS ix 4 Linear discriminant analysis 123 4.1 Introduction 123 4.2 Two-class algorithms 124 4.2.1 General ideas 124 4.2.2 Perceptron criterion 124 4.2.3 Fisher’s criterion 128 4.2.4 Least mean squared error procedures 130 4.2.5 Support vector machines 134 4.2.6 Example application study 141 4.2.7 Further developments 142 4.2.8 Summary 142 4.3 Multiclass algorithms 144 4.3.1 General ideas 144 4.3.2 Error-correction procedure 145 4.3.3 Fisher’s criterion – linear discriminant analysis 145 4.3.4 Least mean squared error procedures 148 4.3.5 Optimal scaling 152 4.3.6 Regularisation 155 4.3.7 Multiclass support vector machines 155 4.3.8 Example application study 156 4.3.9 Further developments 156 4.3.10 Summary 158 4.4 Logistic discrimination 158 4.4.1 Two-group case 158 4.4.2 Maximum likelihood estimation 159 4.4.3 Multiclass logistic discrimination 161 4.4.4 Example application study 162 4.4.5 Further developments 163 4.4.6 Summary 163 4.5 Application studies 163 4.6 Summary and discussion 164 4.7 Recommendations 165 4.8 Notes and references 165 Exercises 165 5 Nonlinear discriminant analysis – kernel methods 169 5.1 Introduction 169 5.2 Optimisation criteria 171 5.2.1 Least squares error measure 171 5.2.2 Maximum likelihood 175 5.2.3 Entropy 176 5.3 Radial basis functions 177 5.3.1 Introduction 177 5.3.2 Motivation 178 5.3.3 Specifying the model 181 x CONTENTS 5.3.4 Radial basis function properties 187 5.3.5 Simple radial basis function 187 5.3.6 Example application study 187 5.3.7 Further developments 189 5.3.8 Summary 189 5.4 Nonlinear support vector machines 190 5.4.1 Types of kernel 191 5.4.2 Model selection 192 5.4.3 Support vector machines for regression 192 5.4.4 Example application study 195 5.4.5 Further developments 196 5.4.6 Summary 197 5.5 Application studies 197 5.6 Summary and discussion 199 5.7 Recommendations 199 5.8 Notes and references 200 Exercises 200 6 Nonlinear discriminant analysis – projection methods 203 6.1 Introduction 203 6.2 The multilayer perceptron 204 6.2.1 Introduction 204 6.2.2 Specifying the multilayer perceptron structure 205 6.2.3 Determining the multilayer perceptron weights 205 6.2.4 Properties 212 6.2.5 Example application study 213 6.2.6 Further developments 214 6.2.7 Summary 216 6.3 Projection pursuit 216 6.3.1 Introduction 216 6.3.2 Projection pursuit for discrimination 218 6.3.3 Example application study 219 6.3.4 Further developments 220 6.3.5 Summary 220 6.4 Application studies 221 6.5 Summary and discussion 221 6.6 Recommendations 222 6.7 Notes and references 223 Exercises 223 7 Tree-based methods 225 7.1 Introduction 225 7.2 Classification trees 225 7.2.1 Introduction 225 7.2.2 Classifier tree construction 228 7.2.3 Other issues 237 7.2.4 Example application study 239 CONTENTS xi 7.2.5 Further developments 239 7.2.6 Summary 240 7.3 Multivariate adaptive regression splines 241 7.3.1 Introduction 241 7.3.2 Recursive partitioning model 241 7.3.3 Example application study 244 7.3.4 Further developments 245 7.3.5 Summary 245 7.4 Application studies 245 7.5 Summary and discussion 247 7.6 Recommendations 247 7.7 Notes and references 248 Exercises 248 8 Performance 251 8.1 Introduction 251 8.2 Performance assessment 252 8.2.1 Discriminability 252 8.2.2 Reliability 258 8.2.3 ROC curves for two-class rules 260 8.2.4 Example application study 263 8.2.5 Further developments 264 8.2.6 Summary 265 8.3 Comparing classifier performance 266 8.3.1 Which technique is best? 266 8.3.2 Statistical tests 267 8.3.3 Comparing rules when misclassification costs are uncertain 267 8.3.4 Example application study 269 8.3.5 Further developments 270 8.3.6 Summary 271 8.4 Combining classifiers 271 8.4.1 Introduction 271 8.4.2 Motivation 272 8.4.3 Characteristics of a combination scheme 275 8.4.4 Data fusion 278 8.4.5 Classifier combination methods 284 8.4.6 Example application study 297 8.4.7 Further developments 298 8.4.8 Summary 298 8.5 Application studies 299 8.6 Summary and discussion 299 8.7 Recommendations 300 8.8 Notes and references 300 Exercises 301 9 Feature selection and extraction 305 9.1 Introduction 305 xii CONTENTS 9.2 Feature selection 307 9.2.1 Feature selection criteria 308 9.2.2 Search algorithms for feature selection 311 9.2.3 Suboptimal search algorithms 314 9.2.4 Example application study 317 9.2.5 Further developments 317 9.2.6 Summary 318 9.3 Linear feature extraction 318 9.3.1 Principal components analysis 319 9.3.2 Karhunen–Loe`ve transformation 329 9.3.3 Factor analysis 335 9.3.4 Example application study 342 9.3.5 Further developments 343 9.3.6 Summary 344 9.4 Multidimensional scaling 344 9.4.1 Classical scaling 345 9.4.2 Metric multidimensional scaling 346 9.4.3 Ordinal scaling 347 9.4.4 Algorithms 350 9.4.5 Multidimensional scaling for feature extraction 351 9.4.6 Example application study 352 9.4.7 Further developments 353 9.4.8 Summary 353 9.5 Application studies 354 9.6 Summary and discussion 355 9.7 Recommendations 355 9.8 Notes and references 356 Exercises 357 10 Clustering 361 10.1 Introduction 361 10.2 Hierarchical methods 362 10.2.1 Single-link method 364 10.2.2 Complete-link method 367 10.2.3 Sum-of-squares method 368 10.2.4 General agglomerative algorithm 368 10.2.5 Properties of a hierarchical classification 369 10.2.6 Example application study 370 10.2.7 Summary 370 10.3 Quick partitions 371 10.4 Mixture models 372 10.4.1 Model description 372 10.4.2 Example application study 374 10.5 Sum-of-squares methods 374 10.5.1 Clustering criteria 375 10.5.2 Clustering algorithms 376 10.5.3 Vector quantisation 382

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.