SPRINGER BRIEFS IN ELECTRICAL AND COMPUTER ENGINEERING SPEECH TECHNOLOGY Mohamed Hesham Farouk Application of Wavelets in Speech Processing Second Edition 123 SpringerBriefs in Electrical and Computer Engineering Speech Technology Series editor Amy Neustein, Fort Lee, NJ, USA Editor’s Note The authors of this series have been hand-selected. They comprise some of the most outstanding scientists—drawn from academia and private industry—whose research is marked by its novelty, applicability, and practicality in providing broad based speech solutions. The SpringerBriefs in Speech Technology series provides the latest findings in speech technology gleaned from comprehensive literature reviews and empirical investigations that are performed in both laboratory and real life settings. Some of the topics covered in this series include the presentation of real life commercial deployment of spoken dialog systems, contemporary methods of speech parameterization, developments in information security for automated speech, forensic speaker recognition, use of sophisticated speech analytics in call centers, and an exploration of new methods of soft computing for improving human- computer interaction. Those in academia, the private sector, the self service industry, law enforcement, and government intelligence, are among the principal audience for this series, which is designed to serve as an important and essential reference guide for speech developers, system designers, speech engineers, linguists and others. In particular, a major audience of readers will consist of researchers and technical experts in the automated call center industry where speech processing is a key component to the functioning of customer care contact centers. Amy Neustein, Ph.D., serves as Editor-in-Chief of the International Journal of Speech Technology (Springer). She edited the recently published book “Advances in Speech Recognition: Mobile Environments, Call Centers and Clinics” (Springer 2010), and serves as quest columnist on speech processing for Womensenews. Dr. Neustein is Founder and CEO of Linguistic Technology Systems, a NJ-based think tank for intelligent design of advanced natural language based emotion- detection software to improve human response in monitoring recorded conversations of terror suspects and helpline calls. Dr. Neustein’s work appears in the peer review literature and in industry and mass media publications. Her academic books, which cover a range of political, social and legal topics, have been cited in the Chronicles of Higher Education, and have won her a pro Humanitate Literary Award. She serves on the visiting faculty of the National Judicial College and as a plenary speaker at conferences in artificial intelligence and computing. Dr. Neustein is a member of MIR (machine intelligence research) Labs, which does advanced work in computer technology to assist underdeveloped countries in improving their ability to cope with famine, disease/illness, and political and social affliction. She is a founding member of the New York City Speech Processing Consortium, a newly formed group of NY-based companies, publishing houses, and researchers dedicated to advancing speech technology research and development. More information about this series at http://www.springer.com/series/10043 Mohamed Hesham Farouk Application of Wavelets in Speech Processing Second Edition Mohamed Hesham Farouk Department of Engineering, Math and Physics Cairo University, Faculty of Engineering Giza, Egypt ISSN 2191-8112 ISSN 2191-8120 (electronic) SpringerBriefs in Electrical and Computer Engineering ISSN 2191-737X ISSN 2191-7388 (electronic) SpringerBriefs in Speech Technology ISBN 978-3-319-69001-8 ISBN 978-3-319-69002-5 (eBook) https://doi.org/10.1007/978-3-319-69002-5 Library of Congress Control Number: 2017958884 © The Author(s) 2014, 2018 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Printed on acid-free paper This Springer imprint is published by Springer Nature The registered company is Springer International Publishing AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland To the soul of my mother Preface This is a new edition of the title Application of Wavelets in Speech Processing published in 2014. All chapters of previous edition have been revised in this work. The sequence of topics is rearranged taking into consideration the dependency among topics such that the information flow through book chapters becomes smoother. More subjects have been added in the current edition reflecting advances in using wavelet analysis for speech processing and its widespread applications. Organization of the Book The chapters of this book have been structured such that each one is self-contained and can be read separately. Each chapter is concerned with a specific application of wavelets in speech technology. Every module in a chapter surveys the literature in this topic such that the use of wavelets in the work is explained and experimental results of proposed method are then discussed. Chapter 1 introduces the topic of speech processing, while Chap. 2 discusses processes for speech production and different approaches in modeling of a speech signal. Chapter 3, thereafter, explains how wavelets can describe and model many features of a speech signal. Applications of wavelet transform (WT) in speech processing are the subjects of subsequent chapters. Collectively, the power of WT in estimating spectral characteristics of speech is explained in Chap. 4 showing how elements of such spectrum can be derived like pitch and formants. Chapter 5 confers the problem of speech activity detection and signal separation based on features extracted from WT. Enhancement and noise cancellation is revised in Chap. 6 showing how WT improves the process. The problem of speech recognition is discussed in Chap. 7 in view of the provided powerful features obtained by a wavelet analysis. Another recognition problem is considered in Chap. 8 discussing the identification of a speaker from his voice. Additionally, a similar topic on emotion recognition through wavelet features in an utterance is elucidated in Chap. 9. Another key application of speech is discussed in Chap. 10 showing how speech signal can be decoded and synthesized using a vii viii Preface low-dimensional features domain. Also, the assessment of speech quality based on WT coefficients is surveyed and explained in Chap. 11. Chapter 12 has been added in this edition to examine how nonlinear features can be extracted from a speech signal by WT. Furthermore, Chap. 13 hits critical application of speech signal in security and steganography. More arguments on clinical diagnosis are discussed in Chap. 14 through wavelet features of speech recorded from a patient. Acknowledgment The author would like to thank the editorial board of the SpringerBriefs series for letting him prepare this monograph and for their continuous cooperation during the preparation of the work. Thanks should also go to my colleagues at the Engineering Physics Department, Bahira Elsebelgy, Ph.D., and M. El-Gohary, M.Sc., for helping in proofreading. Giza, Egypt Mohamed Hesham Farouk Contents 1 Introduction ............................................................................................ 1 1.1 History and Definition of Speech Processing ................................. 1 1.2 Applications of Speech Processing ................................................. 2 1.3 Recent Progress in Speech Processing ............................................ 2 1.4 Wavelet Analysis as an Efficient Tool for Speech Processing ........ 3 References ................................................................................................ 4 2 Speech Production and Perception ....................................................... 5 2.1 Speech Production Process ............................................................. 5 2.2 Classification of Speech Sounds ..................................................... 6 2.3 Speech Production Modeling .......................................................... 7 2.4 Speech Perception Modeling .......................................................... 8 2.5 Intelligibility and Speech Quality Measures ................................... 9 References ................................................................................................ 10 3 Wavelets, Wavelet Filters, and Wavelet Transforms ........................... 11 3.1 Short-Time Fourier Transform (STFT) ........................................... 11 3.2 Multiresolution Analysis and Wavelet Transform ........................... 12 3.3 Wavelets and Bank of Filters .......................................................... 14 3.4 Wavelet Families ............................................................................. 15 3.5 Wavelet Packets ............................................................................... 16 3.6 Undecimated Wavelet Transform .................................................... 18 3.7 The Continuous Wavelet Transform (CWT) ................................... 18 3.8 Wavelet Scalogram .......................................................................... 19 3.9 Empirical Wavelets ......................................................................... 19 References ................................................................................................ 20 4 Spectral Analysis of Speech Signal and Pitch Estimation .................. 23 4.1 Spectral Analysis ............................................................................. 23 4.2 Formant Tracking and Estimation ................................................... 24 4.3 Pitch Estimation .............................................................................. 25 References ................................................................................................ 27 ix x Contents 5 Speech Detection and Separation ......................................................... 29 5.1 Voice Activity Detection ................................................................. 29 5.2 Segmentation of Speech Signal ...................................................... 30 5.3 Source Separation of Speech .......................................................... 31 References ................................................................................................ 33 6 Speech Enhancement and Noise Suppression ..................................... 35 6.1 Thresholding Schemes .................................................................... 36 6.2 Thresholding on Wavelet Packet Coefficients ................................ 37 6.3 Enhancement on Multitaper Spectrum ............................................ 38 References ................................................................................................ 39 7 Speech Recognition ................................................................................ 41 7.1 Signal Enhancement and Noise Cancellation for Robust Recognition .............................................. 41 7.2 Wavelet-Based Features for Better Recognition ............................. 42 7.3 Hybrid Approach ............................................................................. 43 7.4 Wavelet as an Activation Function for Neural Networks in ASR ............................................................................ 44 References ................................................................................................ 45 8 Speaker Identification ............................................................................ 47 8.1 Wavelet-Based Features for Speaker Identification ........................ 48 8.2 Hybrid Feature Sets for Speaker Identification ............................... 49 References ................................................................................................ 49 9 Emotion Recognition from Speech ....................................................... 51 9.1 Wavelet-Based Features for Emotion Recognition ......................... 51 9.2 Combined Feature Set for Better Emotion Recognition ................. 53 9.3 WNN for Emotion Recognition ...................................................... 54 References ................................................................................................ 54 10 Speech Coding, Synthesis, and Compression ...................................... 57 10.1 Speech Synthesis ........................................................................... 57 10.2 Speech Coding and Compression ................................................. 58 10.3 Real-Time Implementation of DWT-Based Speech Compression ..................................................................... 58 References ............................................................................................... 59 11 Speech Quality Assessment ................................................................... 61 11.1 Wavelet-Packet Analysis ............................................................... 61 11.2 Discrete Wavelet Transform .......................................................... 63 References ............................................................................................... 64 12 Scalogram and Nonlinear Analysis of Speech ..................................... 65 12.1 Wavelet-Based Nonlinear Features ............................................... 65 12.2 Wavelet Scalogram Analysis ......................................................... 66 12.3 Nonlinear and Chaotic Components in Speech Signal ................. 67 References ............................................................................................... 69