ebook img

Machine Learning for Audio, Image and Video Analysis: Theory and Applications (Advanced Information and Knowledge Processing) PDF

484 Pages·2007·5.78 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Machine Learning for Audio, Image and Video Analysis: Theory and Applications (Advanced Information and Knowledge Processing)

Advanced Information and Knowledge Processing Series Editors Professor Lakhmi Jain [email protected] Professor Xindong Wu [email protected] Also in this series Gregoris Mentzas, Dimitris Apostolou, Yun-Heh Chen-Burger and Dave Robertson Andreas Abecker and Ron Young Automating Business Modelling Knowledge Asset Management 1-85233-835-0 1-85233-583-1 Dirk Husmeier, Richard Dybowski and Michalis Vazirgiannis, Maria Halkidi and Stephen Roberts (Eds) Dimitrios Gunopulos Probabilistic Modeling in Bioinformatics and Uncertainty Handling and Quality Assessment in Medical Informatics Data Mining 1-85233-778-8 1-85233-655-2 Ajith Abraham, Lakhmi Jain and Asunción Gómez-Pérez, Robert Goldberg (Eds) Mariano Fernández-López and Oscar Corcho Evolutionary Multiobjective Optimization Ontological Engineering 1-85233-787-7 1-85233-551-3 K.C. Tan, E.F. Khor and T.H. Lee Arno Scharl (Ed.) Multiobjective Evolutionary Algorithms and Environmental Online Communication Applications 1-85233-783-4 1-85233-836-9 Shichao Zhang, Chengqi Zhang and Nikhil R. Pal and Lakhmi Jain (Eds) Xindong Wu Advanced Techniques in Knowledge Discovery Knowledge Discovery in Multiple Databases and Data Mining 1-85233-703-6 1-85233-867-9 Jason T.L. Wang, Mohammed J. Zaki, Amit Konar and Lakhmi Jain Hannu T.T. Toivonen and Dennis Shasha (Eds) Cognitive Engineering Data Mining in Bioinformatics 1-85233-975-6 1-85233-671-4 Miroslav Kárný (Ed.) C.C. Ko, Ben M. Chen and Jianping Chen Optimized Bayesian Dynamic Advising Creating Web-based Laboratories 1-85233-928-4 1-85233-837-7 Yannis Manolopoulos, Alexandros Nanopoulos, Manuel Graña, Richard Duro, Alicia d’Anjou Apostolos N. Papadopoulos and and Paul P. Wang (Eds) Yannis Theodoridis Information Processing with Evolutionary R-trees: Theory and Applications Algorithms 1-85233-977-2 1-85233-886-0 Sanghamitra Bandyopadhyay, Ujjwal Maulik, Colin Fyfe Lawrence B. Holder and Diane J. Cook (Eds) Hebbian Learning and Negative Feedback Advanced Methods for Knowledge Discovery Networks from Complex Data 1-85233-883-0 1-85233-989-6 Marcus A. Maloof (Ed.) Arno Scharl and Klaus Tochtermann (Eds) Machine Learning and Data Mining for The Geospatial Web Computer Security 1-84628-826-5 1-84628-029-X Ngoc Thanh Nguyen Sifeng Liu and Yi Lin Advanced Methods for Inconsistent Knowledge Grey Information Management 1-85233-995-0 1-84628-888-3 Vasile Palade, Cosmin Danut Bocaniala and Mikhail Prokopenko (Ed.) Lakhmi Jain (Eds) Advances in Applied Self-organizing Systems Computational Intelligence in Fault Diagnosis 978-1-84628-981-1 1-84628-343-4 Andras Kornai Mitra Basu and Tin Kam Ho (Eds) Mathematical Linguistics Data Complexity in Pattern Recognition 978-1-84628-985-9 1-84628-171-7 Amnon Meisels Samuel Pierre (Ed.) Distributed Search by Constrained Agents E-learning Networked Environments and 978-1-84800-039-1 Architectures 1-84628-351-5 Francesco Camastra • Alessandro Vinciarelli Machine Learning for Audio, Image and Video Analysis Theory and Applications ABC Francesco Camastra, PhD Alessandro Vinciarelli, PhD Polo Universitario Guglielmo Marconi, IDIAP Research Institute, Martigny, University of Pisa, Italy Switzerland AI&KP ISSN 1610-3947 ISBN: 978-1-84800-006-3 e-ISBN: 978-1-84800-007-0 British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Control Number: 2007932413 © Springer-Verlag London Limited 2008 Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms of licences issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers. The use of registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant laws and regulations and therefore free for general use. The publisher makes no representation, express or implied, with regard to the accuracy of the information contained in this book and cannot accept any legal responsibility or liability for any errors or omissions that may be made. Printed on acid-free paper 9 8 7 6 5 4 3 2 1 Springer Science+Business Media springer.com To our parents and families Contents 1 Introduction ............................................... 1 1.1 Two Fundamental Questions ............................ 1 1.1.1 Why Should One Read The Book?................ 1 1.1.2 What Is the Book About? ....................... 2 1.2 The Structure of the Book .............................. 4 1.2.1 Part I: From Perception to Computation........... 4 1.2.2 Part II: Machine Learning ....................... 5 1.2.3 Part III: Applications ........................... 6 1.2.4 Appendices .................................... 7 1.3 How to Read This Book ................................ 7 1.3.1 Background and Learning Objectives.............. 8 1.3.2 Difficulty Level ................................. 8 1.3.3 Problems ...................................... 8 1.3.4 Software....................................... 9 1.4 Reading Tracks........................................ 9 Part I From Perception to Computation 2 Audio Acquisition, Representation and Storage ............ 13 2.1 Introduction .......................................... 13 2.2 Sound Physics, Production and Perception ................ 15 2.2.1 Acoustic Waves Physics ......................... 15 2.2.2 Speech Production.............................. 18 2.2.3 Sound Perception............................... 20 2.3 Audio Acquisition...................................... 22 2.3.1 Sampling and Aliasing........................... 23 2.3.2 The Sampling Theorem**........................ 25 2.3.3 Linear Quantization............................. 27 2.3.4 Nonuniform Scalar Quantization.................. 30 2.4 Audio Encoding and Storage Formats .................... 32 VIII Contents 2.4.1 Linear PCM and Compact Discs.................. 32 2.4.2 MPEG Digital Audio Coding..................... 34 2.4.3 AAC Digital Audio Coding ...................... 35 2.4.4 Perceptual Coding .............................. 36 2.5 Time-Domain Audio Processing ......................... 38 2.5.1 Linear and Time-Invariant Systems ............... 38 2.5.2 Short-Term Analysis ............................ 40 2.5.3 Time-Domain Measures ......................... 41 Problems.................................................... 46 References..................................................... 49 3 Image and Video Acquisition, Representation and Storage . 51 3.1 Introduction .......................................... 51 3.2 Human Eye Physiology ................................. 52 3.2.1 Structure of the Human Eye ..................... 52 3.3 Image Acquisition Devices .............................. 54 3.3.1 Digital Camera................................. 54 3.4 Color Representation................................... 57 3.4.1 Human Color Perception ........................ 57 3.4.2 Color Models................................... 59 3.5 Image Formats ........................................ 66 3.5.1 Image File Format Standards..................... 66 3.5.2 JPEG Standard ................................ 68 3.6 Video Principles ....................................... 72 3.7 MPEG Standard....................................... 73 3.7.1 Further MPEG Standards ....................... 75 3.8 Conclusions ........................................... 77 Problems.................................................... 78 References..................................................... 79 Part II Machine Learning 4 Machine Learning.......................................... 83 4.1 Introduction .......................................... 83 4.2 Taxonomy of Machine Learning.......................... 84 4.2.1 Rote Learning.................................. 84 4.2.2 Learning from Instruction........................ 85 4.2.3 Learning by Analogy ............................ 85 4.3 Learning from Examples................................ 85 4.3.1 Supervised Learning ............................ 86 4.3.2 Reinforcement Learning ......................... 86 4.3.3 Unsupervised Learning .......................... 87 Contents IX 4.4 Conclusions ........................................... 88 References..................................................... 89 5 Bayesian Theory of Decision ............................... 91 5.1 Introduction .......................................... 91 5.2 Bayes Decision Rule.................................... 92 5.3 Bayes Classifier(cid:1) ....................................... 95 5.4 Loss Function ......................................... 96 5.4.1 Binary Classification ............................ 98 5.5 Zero-One Loss Function ................................ 99 5.6 Discriminant Functions .................................100 5.6.1 Binary Classification Case .......................101 5.7 Gaussian Density ......................................101 5.7.1 Univariate Gaussian Density .....................102 5.7.2 Multivariate Gaussian Density....................102 5.7.3 Whitening Transformation .......................104 5.8 Discriminant Functions for Gaussian Likelihood............106 5.8.1 Features Are Statistically Independent ............106 5.8.2 Covariance Matrix Is The Same for All Classes .....107 5.8.3 Covariance Matrix Is Not the Same for All Classes ..109 5.9 Receiver Operating Curves..............................109 5.10 Conclusions ...........................................111 Problems....................................................112 References.....................................................115 6 Clustering Methods ........................................117 6.1 Introduction ..........................................117 6.2 Expectation and Maximization Algorithm(cid:1)................119 6.2.1 Basic EM(cid:1) .....................................120 6.3 Basic Notions and Terminology ..........................122 6.3.1 Codebooks and Codevectors......................122 6.3.2 Quantization Error Minimization .................124 6.3.3 Entropy Maximization...........................124 6.3.4 Vector Quantization.............................125 6.4 K-Means..............................................127 6.4.1 Batch K-Means.................................128 6.4.2 Online K-Means ................................129 6.4.3 K-Means Software Packages......................132 6.5 Self-Organizing Maps...................................132 6.5.1 SOM Software Packages .........................134 6.5.2 SOM Drawbacks................................134 6.6 Neural Gas and Topology Representing Network ...........134 6.6.1 Neural Gas ....................................135 X Contents 6.6.2 Topology Representing Network ..................135 6.6.3 Neural Gas and TRN Software Package............137 6.6.4 Neural Gas and TRN Drawbacks .................137 6.7 General Topographic Mapping(cid:1)..........................137 6.7.1 Latent Variables(cid:1) ...............................137 6.7.2 Optimization by EM Algorithm(cid:1)..................139 6.7.3 GTM versus SOM(cid:1) .............................140 6.7.4 GTM Software Package..........................141 6.8 Fuzzy Clustering Algorithms ............................141 6.8.1 FCM..........................................142 6.9 Hierarchical Clustering .................................142 6.10 Conclusion............................................144 Problems....................................................145 References.....................................................147 7 Foundations of Statistical Learning and Model Selection .......................................149 7.1 Introduction ..........................................149 7.2 Bias-Variance Dilemma.................................150 7.2.1 Bias-Variance Dilemma for Regression.............150 7.2.2 Bias-Variance Decomposition for Classification(cid:1) ....151 7.3 Model Complexity .....................................153 7.4 VC Dimension and Structural Risk Minimization ..........156 7.5 Statistical Learning Theory(cid:1) ............................159 7.5.1 Vapnik-Chervonenkis Theory.....................161 7.6 AIC and BIC Criteria ..................................163 7.6.1 Akaike Information Criterion.....................163 7.6.2 Bayesian Information Criterion ...................164 7.7 Minimum Description Length Approach ..................165 7.8 Crossvalidation ........................................166 7.8.1 Generalized Crossvalidation ......................166 7.9 Conclusion............................................168 Problems....................................................168 References.....................................................171 8 Supervised Neural Networks and Ensemble Methods ....................................173 8.1 Introduction ..........................................173 8.2 Artificial Neural Networks and Neural Computation........174 8.3 Artificial Neurons......................................175 8.4 Connections and Network Architectures ..................179 8.5 Single-Layer Networks..................................180

Description:
This book is divided into three parts: From Perception to Computation - Shows how the physical supports our auditory and visual perceptions. In other words, it shows how acoustic waves and electromagnetic radiation are converted into objects that can be manipulated by a computer. Machine Learning -
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.