ebook img

Mathematical Modeling and Signal Processing in Speech and Hearing Sciences PDF

214 Pages·2014·17.002 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Mathematical Modeling and Signal Processing in Speech and Hearing Sciences

MS&A Volume 10 Editor-in-Chief A. Quarteroni Series Editors T.Hou C. Le Bris A.T. Patera E. Zuazua For further volumes: http://www.springer.com/series/8377 Jack Xin . Yingyong Qi Mathematical Modeling and Signal Processing in Speech and Hearing Sciences ~ Springer Jack Xin Yingyong Qi Department of Mathematics Depaliment of Mathematics UC Irvine UC Irvine Irvine, CA, USA Irvine, CA, USA ISSN: 2037-5255 ISSN: 2037-5263 (electronic) MS&A - Modeling, Simulation & Applications ISBN 978-3-319-03085-2 ISBN 978-3-319-03086-9 (eBook) DOl 10.1007/978-3-319-03086-9 Springer Cham Heidelberg New York Dordrecht London Library of Congress Control Number: 2013951655 © Springer International Publishing Switzerland 2014 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dis similar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions ofthe Copyright Law ofthe Publisher's location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law. The use of general descriptive names, registered names, trademarks, service marks, etc. in this pub lication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Cover Design: Beatrice II, Milano Cover figure: Dandan Yu Typesetting with IbTEX: PTP-Berlin, Protago TEX-Production GmbH, Germany (www.ptp-berlin.de) Springer is a part of Springer Science+Business Media (www.springer.com) Dedicated with love to our families Preface Speech and hearing sciences are fundamental to numerous technological advances of the digital world in the past decade, from music compression in MP3 to digital hearing aids, from network based voice enabled services to speech interaction with mobile phones. Mathematics and computation are intimately related to these leaps and bounds. On the other hand, speech and hearing are strongly interdisciplinary areas where dissimilar scientific and engineering publications and approaches often coexist and make it difficult for newcomers to enter. The aim of our book is to give an accessible introduction of mathematical mod els and signal processing methods in speech and hearing sciences for senior under graduate and beginning graduate students with basic knowledge of linear algebra, differential equations, numerical analysis, and probability. The models and methods are selected based on their physical and biological origin, mathematical simplicity, and their utility for signal processing. Connections are drawn as much as possible between model solutions and speech/hearing phenomena. Concepts such as critical bands, sound masking, and hearing loss are introduced in terms of both model so lutions and experimental data. It is our hope that the self-contained presentation of hidden Markov models and the associated Matlab codes for isolated words recogni tion in chapter four will help make speech recognition accessible to beginners. We include representative Matlab programs and a moderate number of exercises in each chapter to help the readers gain hands-on experience and consolidate their under standing. Speech data for the Matlab programs are either clean signals or recorded mixtures downloadable from the first author's website. Matlab signal processing and statistics toolboxes are needed for some of the programs. The mathematical tools consist of elementary analysis of differential equations, asymptotic and numerical methods, transform techniques, filtering and clustering methods, statistical and opti mization methods. Some of these tools show up multiple times in the book especially in the context of solving concrete model and real world problems. The first chapter of the book presents background materials on function spaces, Fourier and z-transforms, filtering-clustering-spectral analysis of data, optimization and statistical methods. Chapter two is on modeling speech production with me chanical and digital source-filter models. Chapter three discusses partial differential Vlll Preface equation (PDE) models of the peripheral auditory system, their analysis and compu tation, their applications in sound transform and processing, and hearing aids. Chap ter four introduces the hidden Markov concept, the framework of speech recognition, and the related learning and searching algorithms. Chapter five studies blind source separation and speech enhancement (noise reduction) methods based on statistical criteria, sparsity and feature clustering in time-frequency domain. The order of chap ter two to chapter five follows logically the human speech chain: speech production, audition, recognition and signal processing. The book is based on the authors' decade long collaborations with graduate stu dents, postdoctoral fellows and colleagues in mathematics, speech and hearing sci ences, and signal processing. We are grateful to Professor Stanley Osher for his constant support and his pioneering work on image processing that inspired us. We thank the following colleagues (in alphabetical order) for their interest, encourage ment and assistance that helped us embark on our journey and pursue our goals: Professors Luis Caffarelli, Russel Caflisch, Emmanuel Candes, Tony Chan, Ingrid Daubechies, Susan Friedlander, Irene Gamba, James Hyman, Joe Keller, Peter Lax, Jerry Marsden, Tinsley Oden, George Papanicolaou, Charles Peskin, George Pollak, Donald Saari, Charles Steele, Ronald Stem, Howard Tucker, Frederick Wan, Shing Tung Yau, and Hongkai Zhao. We thank Professors Li Deng, Deliang Wang, Yang Wang, and Fan-Gang Zeng for many fruitful discussions on speech and hearing re search and applications. Progress would not have been possible without the oppor tunity of working with creative and energetic students, postdoctoral fellows and vis iting scholars (in chronological order): M. Drew LaMar, Y ongsam Kim, Jie Liu, Hsin-I Yang, Meng Yu, J. Ernie Esser, Yuanchang Sun, Wenye Ma, Ryan Ritch, Penghang Yin, Daniel Quang, Yifei Lou, He Qi and Xiaohua Shi. Part of the book has been used for training and supervised research experience of the undergraduate students of the NSF supported PRISM (Proactive Recruitment in Introductory Sci ence and Mathematics) program at UC Irvine (iCAMP) with the help of Dr. Ernie Esser (2009 -2013). We benefited from the IMA Speech Processing Workshop at the University of Minnesota in 2000, and from organizing and interacting with the participants of the IP AM workshop on "Mathematics of the Ear and Sound Signal Processing" at UCLA in 2005. Part of the materials is drawn from lectures at the Beijing Summer School in 2010 organized by Professor Zhimin Chen at Academia Sinica. We thank Professor Thomas Hou for kindly hosting one of us at Caltech while our work was ongoing, and for suggesting this book project. We thank Dandan Yu for the cover figure design. Finally, we acknowledge the financial support from the National Science Foundation (NSF), the Guggenheim Foundation, the Army Re search Office, the National Institute of Health, the University of Texas at Austin, and the University of California at Irvine. Irvine, California Jack Xin September 2013 Yingyong Qi Contents 1 Background Signal Processing, Statistical and Optimization Methods 1 1.1 Introduction .............................................. . 1 1.2 Fourier and z-Transforms ................................... . 1 1.2.1 Continuous Time Signals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2.2 Fourier Transform and Basic Properties ................. 3 1.2.3 Discrete Time Signals and Systems . . . . . . . . . . . . . . . . . . . . . 5 1.2.4 Sampling and Shannon Theory. . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.5 Discrete Fourier Transform. . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.2.6 Discrete Time and Windowed Fourier Transforms. . . . . . . .. 11 1.2.7 Short Time Fourier Transform, Synthesis and Spectrogram. 14 1.2.8 z-Transform ........................................ 16 1.3 Filtering and Convolution ................................... 17 1.3.l Circular Convolution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 19 1.3.2 Linear Convolution and z-Transform. . . . . . . . . . . . . . . . . . .. 22 1.3.3 Circular Convolution and z-Transform ... . . . . . . . . . . . . . .. 23 1.3.4 Rational Filters, Impulse and Frequency Responses. . . . . . .. 24 1.3.5 Group and Phase Delays. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 28 1.3.6 Minimum Phase and All Pass Filters. . . . . . . . . . . . . . . . . . .. 29 1.4 Random Variables, Correlation and Independence . . . . . . . . . . . . . .. 30 1.4.1 Basic Notion and Examples ........................... 30 1.4.2 Joint Distribution and Independent Components .......... 32 1.4.3 Random Number Generation .......................... 33 1.4.4 Stochastic Processes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 34 1.4.5 Random Walk and Brownian Motion .................... 35 1.5 Data Clustering and K-Means Method. . . . . . . . . . . . . . . . . . . . . . . .. 35 1.6 Maximum Likelihood Method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 37 1.7 Least Squares and Sparse Optimization Methods. . . . . . . . . . . . . . .. 38 1. 8 Exercises................................................. 41 x Contents 2 Speech Modeling ............................................... 45 2.1 Introduction............................................... 45 2.2 Two Mass Vocal Fold Model. ................................ 46 2.3 Matlab Program and Animation of Two Mass Model. . . . . . . . . . . .. 49 2.4 Hydrodynamic Semi-Continuum Vocal Fold Model .............. 53 2.5 Source-Filter Model of Speech Production. . . . . . . . . . . . . . . . . . . . .. 58 2.5.1 Uniform Lossless Tube Model and Transfer Function. . . . .. 58 2.5.2 Concatenated Lossless Tube Model: Traveling Waves and Transfer Function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 59 2.5.3 Radiation and the Complete Model ..................... 62 2.5.4 Matlab Programs for Vowel and Consonant Synthesis ..... 63 2.6 Exercises................................................. 66 3 Auditory Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 67 3.1 Introduction............................................... 67 3.2 Macromechanics and Passive Models. . . . . . . . . . . . . . . . . . . . . . . . .. 69 3.3 Micromechanics and Two Level Nonlocal Active Models. . . . . . . .. 71 3.4 Dispersion and Decay Properties of Plane Waves . . . . . . . . . . . . . . .. 73 3.5 Time Harmonic Solutions ................................... 75 3.6 Asymptotic and Transform Techniques. . . . . . . . . . . . . . . . . . . . . . .. 78 3.7 Logarithmic Scales and Critical Bands. . . . . . . . . . . . . . . . . . . . . . . .. 81 3.8 Time Demain Method and Dispersive Instability ................ 82 3.9 Boundary Integral Method and Suppression of Instability . . . . . . . .. 86 3.10 Computational Methods of Nonlocal Active Models ............. 89 3.11 Nonlinear Phenomena and Sound Masking ..................... 90 3.12 Invertible Auditory Transforms ............................... 91 3.13 Orthogonal Auditory Transforms ............................. 93 3.14 Modeling Masking Thresholds ............................... 98 3.15 Modeling Hearing Loss and Hearing Aids ...................... 103 3.16 Matlab Programs ........................................... 109 3.17 Exercises ................................................. 113 4 Speech Recognition ............................................. 115 4.1 Introduction ............................................... 115 4.2 Hidden Markov Model (HMM) for Speech Processing ........... 115 4.2.1 Speech Spectral Analysis .............................. 117 4.2.2 Vector Quantization .................................. 118 4.3 HMM for Isolated Word Recognition .......................... 119 4.3.1 Forward and Backward Probabilities .................... 122 4.3.2 Saum-Welch Re-Estimation ........................... 123 4.3.3 Viterbi Decoding .................................... 125 4.4 Summary of Matlab Programs ................................ 126 4.5 Chapter Summary .......................................... 127 4.6 Matlab Programs ........................................... 128 4.7 Exercises ................................................. 139 Contents Xl 5 Blind Source Separation and Speech Enhancement ................ 141 5.1 Introduction ............................................... 141 5.2 Instantaneous Mixture and Decorre1ation Methods ............... 141 5.2.1 Decorre1ation with Second Order Statistics ............... 142 5.2.2 Demixing with Joint Second and Third Order Statistics .... 143 5.3 Instantaneous Mixture and Cumu1ant Method ................... 145 5.3.1 Moments and Cumu1ants .............................. 145 5.3.2 Source Recovery and Whitening Process ................ 148 5.3.3 Unitary Factor as Joint Diagona1izer ofCumulant Matrices. 149 5.3.4 Joint Diagona1ization of Eigenmatrices .................. 149 5.3.5 Jacobi Method and Joint Diagonalizer Formula ........... 150 5.4 Instantaneous Mixture and Infomax Methods ................... 153 5.4.1 Statistical Equations for Source Separation ............... 153 5.4.2 Iterative Methods .................................... 155 5.4.3 Uniform Bounds ..................................... 156 5.4.4 Convergence and Source Separation .................... 158 5.4.5 Numerical Example .................................. 160 5.5 Convolutive Mixture and Decorrelation Method ................. 160 5.5.1 Decorrelation Equations .............................. 162 5.5.2 Constrained and Penalized Optimization ................. 163 5.5.3 Numerical Example .................................. 164 5.6 Convolutive Mixture and Infomax Methods .................... 164 5.6.1 Extensions and Analysis of Algorithms .................. 165 5.6.2 Numerical Example .................................. 168 5.7 Relative Sparsity and Time-Frequency Domain Methods ......... 169 5.8 Convex Speech Enhancement Model .......................... 172 5.8.1 Convex Model and 11 Regularization .................... 172 5.8.2 Minimization by Bregman Method ..................... 175 5.9 Summary and Other Methods for Further Reading ............... 178 5.10 MatlabPrograms ........................................... 178 5.11 Exercises ................................................. 187 References ......................................................... 189 Index .... ......................................................... 199 References 1. Aichner, R., Buchner, H., Yan, F., Kellennann, W.: A real-time blind source separation scheme and its application to reverberant and noisy acoustic environments. Signal Pro cessing 86(6), 1260-1277 (2006) 2. Alipour, F., Berry, D., Titze, 1.: A finite-element model of vocal fold vibration. 1 Acous. Soc. Am. 108(6),3003-3012 (2000) 3. Allen, J.B.: Two-dimensional cochlear fluid model: New results, J. Acoust. Soc. Amer, 61(1),110-119 (1977) 4. Allen, lB.: Cochlear Modeling-1980. In: Holmes, M., Rubenfeld, L., (eds.) Lecture Notes in Biomathematics, Vol. 43, pp. 1-8. Springer-Verlag, Berlin Heidelberg New York (1980) 5. Allen, J.B., Sondhi, M.: Cochlear macromechanics: Time domain solutions. J. Acoust. Soc. Amer. 66(1), 123-132 (1979) 6. Amari, S., Cichocki, A., Yang, H.-H.: A new leatlling algorithm for blind signal separa tion, Adv. Neural Information Processing System, 8, 757-763 (1996) 7. ANSI: ANSI S3.5-1997. American National Standard methods for calculation of the Speech Intelligibility Index. American National Standard Institute Inc., New York 8. Araki, S., Makino, S., Sawada, H., Mukai, R.: Reducing musical noise by a fine-shift overlap-add method applied to source separation using a time-frequency mask. In: Proc. ICASSP, Vol. III, pp. 81-84 (2005) 9. Araki, S., Sawada, H., Mukai, R., Makino, S.: Blind sparse source separation with spa tially smoothed time-frequency masking. In: Proc. Int. Workshop on Acoustic Echo and Noise Control (2006) 10. Araki, S., Sawada, H., Makino, S.: K-means Based Underdetermined Blind Speech Sep aration. In: Makino, S., Lee, T.-W., Sawada, H. (eds.) Blind Speech Separation, Chap. 9. Springer, Dordrecht, The Netherlands (2007) 11. Araki, S., Sawada, H., Mukai, R., Makino, S.: Underdetermined blind sparse source sep aration for arbitrarily arranged multiple sensors. Signal Processing 87, 1833-1847 (2007) 12. von Bekesy, G.: Experiments in Hearing. McGraw-Hill, New York (1960) 13. von Bekesy, G.: Traveling Waves as Frequency Analyzer in the Cochlea. Nature 225, 1207-1209 (1970) 14. Bell, A., Sejnowski, T.: An Infonnation-Maximization Approach to Blind Separation and Blind Deconvolution. Neural Computation 7, 1129-1159 (1995) J. Xin, Y. Qi: Mathematical Modeling and Signal Processing in Speech and Hearing Sciences, MS&A 10. DOl 10.1007/978-3-319-03086-9, © Springer International Publishing Switzer land 2014

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.