ebook img

Ten Lectures on Statistical and Structural Pattern Recognition PDF

531 Pages·2002·17.018 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Ten Lectures on Statistical and Structural Pattern Recognition

Ten Lectures on Statistical and Structural Pattern Recognition Computational Imaging and Vision Managing Editor MAX A. VIERGEVER Utrecht University, Utrecht, The Netherlands Editorial Board RUZENA BAJCSY, University ofP ennsylvania, Philadelphia, USA MIKE BRADY, Oxford University, Oxford, UK OLIVIER D. FAUGERAS, INRIA, Sophia-Antipolis, France JAN J. KOENDERINK, Utrecht University, Utrecht, The Netherlands STEPHEN M. PIZER, University ofN orth Carolina, Chapel Hill, USA SABURO TSUJI, Wakayama University, Wakayama, Japan STEVEN W. ZUCKER, McGill University, Montreal, Canada Volume24 Ten Lectures on Statistical and Structural Pattern Recognition by Michaill. Schlesinger Ukranian Academy of Sciences, Kiev, Ukraine and Vaclav Hlavac Czech Technical University, Prague, Czech Republic SPRINGER-SCIENCE+BUSINESS MEDIA, B.V. A C.I.P. Catalogue record for this book is available from the Library of Congress. ISBN 978-90-481-6027-3 ISBN 978-94-017-3217-8 (eBook) DOI 10.1007/978-94-017-3217-8 This is a completely revised and updated translation of Deset prednasek z teorie statistickeho a strukturniho rozpoznavani, by M.I. Schlesinger and V. Hlavac. Published by Vydavatelstvi CVUT, Prague 1999. Translated by the authors. This book was typeset by Vit Zyka in Latex using Computer Modern 10/12 pt. Printed on acid-free paper AII Rights Reserved © 2002 Springer Science+Business Media Dordrecht Origina\ly published by Kluwer Academic Publishers in 2002 Softcover reprint ofthe hardcover Ist edition 2002 No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Contents Preface xi Preface to the English edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi A letter from the doctoral student Jifi Pecha prior to publication of the lectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii A letter from the authors to the doctoral student Jifi Pecha . . . . . . . xiii Basic concepts and notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Acknowledgements xix Lecture 1 Bayesian statistical decision making 1 1.1 Introduction to the analysis of the Bayesian task . . . . . . . . . . . . 1 1.2 Formulation of the Bayesian task . . . . . . . . . . . . . . . . . . . . . . . 1 1.3 Two properties of Bayesian strategies . . . . . . . . . . . . . . . . . . . . 3 1.4 Two particular cases of the Bayesian task . . . . . . . . . . . . . . . . . 7 1.4.1 Probability of the wrong estimate of the state . . . . . . . . . 7 1.4.2 Bayesian strategy with possible rejection . . . . . . . . . . . . . 9 1.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.6 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Lecture 2 Non-Bayesian statistical decision making 25 2.1 Severe restrictions of the Bayesian approach . . . . . . . . . . . . . . 25 2.1.1 Penalty function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.1.2 A priori probability of situations . . . . . . . . . . . . . . . . . 26 2.1.3 Conditional probabilities of observations . . . . . . . . . . . . 27 2.2 Formulation of the known and new non-Bayesian tasks. . . . . . . 28 2.2.1 Neyman-Pearson task . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.2.2 Generalised task with two dangerous states . . . . . . . . . . 31 2.2.3 Minimax task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.2.4 Wald task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.2.5 Statistical decision tasks with non-random interventions . 33 v vi Contents 2.3 The pair of dual linear programming tasks, properties and solutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.4 The solution of non-Bayesian tasks using duality theorems . . . . 40 2.4.1 Solution of the Neyman-Pearson task . . . . . . . . . . . . . . 41 2.4.2 Solution of generalised Neyman-Pearson task with two dangerous states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.4.3 Solution of the minimax task . . . . . . . . . . . . . . . . . . . . 46 2.4.4 Solution of Wald task for the two states case . . . . . . . . . 48 2.4.5 Solution of Wald task in the case of more states . . . . . . . 50 2.4.6 Testing of complex random hypotheses . . . . . . . . . . . . . 52 2.4. 7 Testing of complex non-random hypotheses . . . . . . . . . . 53 2.5 Comments on non-Bayesian tasks . . . . . . . . . . . . . . . . . . . . . . 53 2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2. 7 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Lecture 3 Two statistical models of the recognised object 73 3.1 Conditional independence of features . . . . . . . . . . . . . . . . . . . 73 3.2 Gaussian probability distribution . . . . . . . . . . . . . . . . . . . . . . 75 3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.4 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Lecture 4 Learning in pattern recognition 101 4.1 Myths about learning in pattern recognition .............. 101 4.2 Three formulations of learning tasks in pattern recognition . . . . 102 4.2.1 Learning according to the maximal likelihood ........ 104 4.2.2 Learning according to a non-random training set ...... 105 4.2.3 Learning by minimisation of empirical risk ........... 106 4.3 Basic concepts and questions of the statistical theory of learning 108 4.3.1 Informal description of learning in pattern recognition .. 108 4.3.2 Foundations of the statistical learning theory according to Chervonenkis and Vapnik ............................ 113 4.4 Critical view of the statistical learning theory . . . . . . . . . . . . . 120 4.5 Outlines of deterministic learning . . . . . . . . . . . . . . . . . . . . . . 122 4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 4. 7 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 Lecture 5 Linear discriminant function 137 5.1 Introductory notes on linear decomposition ............... 137 5.2 Guide through the topic of the lecture . . . . . . . . . . . . . . . . . . . 138 5.3 Anderson tasks ................................... 141 5.3.1 Equivalent formulation of generalised Anderson task .... 141 5.3.2 Informal analysis of generalised Anderson task ........ 142 5.3.3 Definition of auxiliary concepts for Anderson tasks ..... 145 5.3.4 Solution of Anderson original task ................. 147 5.3.5 Formal analysis of generalised Anderson task ......... 150 Contents VII 5.3.6 Outline of a procedure for solving generalised Anderson task ........................................... 157 5.4 Linear separation of finite sets of points . . . . . . . . . . . . . . . . . 159 5.4.1 Formulation of tasks and their analysis ............. 159 5.4.2 Algorithms for linear separation of finite sets of points .. 163 5.4.3 Algorithm for c:-optimal separation of finite sets of points by means of the hyperplane .......................... 167 5.4.4 Construction of Fisher classifiers by modifying Kozinec and perceptron algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 5.4.5 Further modification of Kozinec algorithms . . . . . . . . . . 171 5.5 Solution of the generalised Anderson task ................ 175 5.5.1 c:-solution of Anderson task ..................... 175 5.5.2 Linear separation of infinite sets of points ........... 179 5.6 Discussion ...................................... 182 5.7 Link to a toolbox ................................. 213 5.8 Bibliographical notes ............................... 213 Lecture 6 Unsupervised learning 215 6.1 Introductory comments on the specific structure of the lecture 215 6.2 Preliminary and informal definition of unsupervised learning . . . 217 6.3 Unsupervised learning in a perceptron ................... 219 6.4 Empirical Bayesian approach after H. Robbins ............. 226 6.5 Quadratic clustering and formulation of a general clustering task 232 6.6 Unsupervised learning algorithms and their analysis ......... 238 6.6.1 Formulation of a recognition task ................. 238 6.6.2 Formulation of a learning task ................... 238 6.6.3 Formulation of an unsupervised learning task ......... 240 6.6.4 Unsupervised learning algorithm .................. 241 6.6.5 Analysis of the unsupervised learning algorithm ....... 242 6.6.6 Algorithm solving Robbins task and its analysis ....... 251 6. 7 Discussion ...................................... 253 6.8 Link to a toolbox ................................. 273 6.9 Bibliographical notes ......................... ; ..... 274 Lecture 7 Mutual relationship of statistical and structural recognition 275 7.1 Statistical recognition and its application areas ............ 275 7.2 Why is structural recognition necessary for image recognition? . 277 7.2.1 Set of observations ............................ 277 7.2.2 Set of hidden parameter values for an image ......... 280 7.2.3 The role of learning and unsupervised learning in image recognition ...................................... 281 7.3 Main concepts necessary for structural analysis ............ 284 7.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 7.5 Bibliographical notes ............................... 305 VIII Contents Lecture 8 Recognition of Markovian sequences 307 8.1 Introductory notes on sequences ....................... 307 8.2 Markovian statistical model of a recognised object .......... 308 8.3 Recognition of the stochastic automaton ................. 312 8.3.1 Recognition of the stochastic automaton; problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 8.3.2 Algorithm for a stochastic automaton recognition ...... 313 8.3.3 Matrix representation of the calculation procedure ..... 314 8.3.4 Statistical interpretation of matrix multiplication ...... 316 8.3.5 Recognition of the Markovian object from incomplete data 318 8.4 The most probable sequence of hidden parameters .......... 321 8.4.1 Difference between recognition of an object as a whole and recognition of parts that form the object ............. 321 8.4.2 Formulation of a task seeking the most probable sequence of states ........................................ 321 8.4.3 Representation of a task as seeking the shortest path in a graph ........................................ 321 8.4.4 Seeking the shortest path in a graph describing the task . 323 8.4.5 On the necessity of formal task analysis . . . . . . . . . . . . . 326 8.4.6 Generalised matrix multiplications ................ 327 8.4. 7 Seeking the most probable subsequence of states . . . . . . 330 8.5 Seeking sequences composed of the most probable hidden parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 8.6 Markovian objects with acyclic structure . . . . . . . . . . . . . . . . . 338 8.6.1 Statistical model of an object .................... 338 8.6.2 Calculating the probability of an observation . . . . . . . . . 339 8.6.3 The most probable ensemble of hidden parameters ..... 343 8. 7 Formulation of supervised and unsupervised learning tasks . . . . 344 8.7.1 The maximum likelihood estimation of a model during learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 8.7.2 Minimax estimate of the model ................... 345 8.7.3 Tuning of the recognition algorithm ............... 346 8.7.4 Task of unsupervised learning .................... 347 8.8 Maximum likelihood estimate of the model ............... 347 8.9 Minimax estimate of a statistical model ................. 353 8.9.1 Formulation of an algorithm and its properties ........ 353 8.9.2 Analysis of a minimax estimate .................. 356 8.9.3 Proof of the minimax estimate algorithm of a Markovian mode .......................................... 366 8.10 Tuning the algorithm that recognises sequences ............ 366 8.11 The maximum likelihood estimate of statistical model ........ 368 8.12 Discussion ...................................... 372 8.13 Link to a toolbox ................................. 395 8.14 Bibliographical notes ............................... 395 Contents IX Lecture 9 Regular languages and corresponding pattern recognition tasks 397 9.1 Regular languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 9.2 Other ways to express regular languages ................. 399 9.2.1 Regular languages and automata ................. 399 9.2.2 Regular languages and grammars ................. 400 9.2.3 Regular languages and regular expressions ........... 401 9.2.4 Example of a regular language expressed in different ways 402 9.3 Regular languages respecting faults; best and exact matching .. 404 9.3.1 Fuzzy automata and languages ................... 405 9.3.2 Penalised automata and corresponding languages ...... 406 9.3.3 Simple best matching problem ................... 407 9.4 Partial conclusion after one part of the lecture ............. 409 9.5 Levenstein approximation of a sentence .................. 410 9.5.1 Preliminary formulation of the task ................ 410 9.5.2 Levenstein dissimilarity ........................ 411 9.5.3 Known algorithm calculating Levenstein dissimilarity ... 412 9.5.4 Modified definition of Levenstein dissimilarity and its properties ....................................... 414 9.5.5 Formulation of the problem and comments to it ....... 417 9.5.6 Formulation of main results and comments to them .... 418 9.5.7 Generalised convolutions and their properties ......... 420 9.5.8 Formulation of a task and main results in convolution form ........................................... 427 9.5.9 Proof of the main result of this lecture ............. 429 9.5.10 Nonconvolution interpretation of the main result ...... 440 9.6 Discussion ...................................... 443 9.7 Link to a toolbox ................................. 477 9.8 Bibliographical notes ............................... 477 Lecture 10 Context-free languages, their 2-D generalisation, related tasks 4 79 10.1 Introductory notes ................................. 479 10.2 Informal explanation of two-dimensional grammars and languages 480 10.3 Two-dimensional context-free grammars and languages ....... 484 10.4 Exact matching problem. Generalised algorithm of C-Y- K .... 486 10.5 General structural construction ....................... 489 10.5.1 Structural construction defining observed sets ........ 490 10.5.2 Basic problem in structural recognition of images ...... 493 10.5.3 Computational procedure for solving the basic problem . 494 10.6 Discussion ...................................... 498 10.7 Bibliographical notes ............................... 505 Bibliography 507 Index 514 Preface Preface to the English edition This monograph Ten Lectur,es on Statistical and Structural Pattern Recognition uncovers the close relationship between various well known pattern recognition problems that have so far been considered independent. These relationships became apparent when formal procedures addressing not only known prob lems but also their generalisations were discovered. The generalised problem formulations were analysed mathematically and unified algorithms were found. The book unifies of two main streams ill pattern recognition-the statisti cal a11d structural ones. In addition to this bridging on the uppermost level, the book mentions several other unexpected relations within statistical and structural methods. The monograph is intended for experts, for students, as well as for those who want to enter the field of pattern recognition. The theory is built up from scratch with almost no assumptions about any prior knowledge of the reader. Even when rigorous mathematical language is used we make an effort to keep the text easy to comprehend. This approach makes the book suitable for students at the beginning of their scientific career. Basic building blocks are explained in a style of an accessible intellectual exercise, thus promoting good practice in reading mathematical text. The paradoxes, beauty, and pitfalls of scientific research are shown on examples from pattern recognition. Each lecture is amended by a discussion with an inquisitive student that elucidates and deepens the explanation, providing additional pointers to computational procedures and deep rooted errors. We have tried to formulate clearly and cleanly individual pattern recognition problems, to find solutions, and to prove their properties. We hope that this approach will attract mathematically inclined people to pattern recognition, which is often not the case if they open a more practically oriented literature. The precisely defined domain and behaviour of the method can be very substan tial for the user who creates a complicated machine or algorithm from simpler modules. XI

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.