ebook img

Linear Prediction of Speech PDF

299 Pages·1976·27.111 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Linear Prediction of Speech

1. D. Markel A. H. Gray, Jr. Linear Prediction of Speech With 129 Figures Springer-Verlag Berlin Heidelberg New York 1976 John D. Markel Speech Communications Research Laboratory, Inc., Santa Barbara, California 93109, USA Augustine H. Gray, Jr. Department of Electrical Engineering and Computer Science, University of California Santa Barbara, California 93106, USA and Speech Communications Research Laboratory, Inc. Santa Barbara, California 93109, USA ISBN-13: 978-3-642-66288-1 e-ISBN-13: 978-3-642-66286-7 DOL: 10.1007/978-3-642-66286-7 Library of Congress Cataloging in Publication Data. Markel, John D. 1943- Linear prediction of speech. (Communication and cybernetics; 12). Bibliography: p. Includes index. 1. Speech processing systems. 2. Speech synthesis. I. Gray. Augustine H., 1936- joint author. II. Title. TK7882.S65M37. 621.38. 75-40003 This work is subject to copyright. All fights are reserved, whether the whole or part of the material is concerned, specifically those of translation, reprinting,. fe-use of illustrations, broadcasting, reproduction by photocopying machine or similar means, and storage in data banks. Under § 54 of the German Copyright Law where copies are made for other than private use, a fee is payable to the publisher, the amount of the fee to be determined by agreement with the publisher. to by Springer-Verlag Berlin Heidelberg 1976. Softcover reprint of the hardcover 1st edition 1976 To Virginia and Averill Preface During the past ten years a new area in speech processing, generally referred to as linear prediction, has evolved. As with all scientific research, results did not always get published in a logical order and terminology was not always con sistent. In mid-1974, we decided to begin an extra hours and weekends project of organizing the literature in linear prediction of speech and developing it into a unified presentation in terms of content and terminology. This effort was completed in November, 1975, with the contents presented herein. If there are two words which describe our goals in this book, they are unifica tion and depth. Considerable effort has been spent on showing the interrelation ships among various linear prediction formulations and solutions, and in develop ing extensions such as acoustic tube models and synthesis filter structures in a unified manner with consistent terminology. Topics are presented in such a manner that derivations and theoretical details are covered, along with Fortran sub routines and practical considerations. Using this approach we hope to have made the material useful for a wide range of backgrounds and interests. The organization of material reflects our particular approach to presenting many important topics. The material itself reflects considerable research per formed primarily by a rather small number of colleagues: Drs. Bishnu Atal, Fumitada Itakura, John Makhoul, Shuzo Saito, and Hisashi Wakita, as can be seen from the references. It is very gratifying to all of us who have been involved in developing this area to see the significant impact that linear prediction techniques have made in speech processing, and to see the rather large continuing research interest in this area. Valuable technical comments and criticisms of certain portions of the manu script were obtained from Drs. B. S. Atal, 1. Makhoul, and H. Wakita. We are also indebted to Dr. Wakita for his assistance in the preparation of Chapter 4 and for providing us with several illustrations and examples. We are grateful to the following individuals who provided us with information and illustrations for use in the book: B. S. Atal, R. G. Crichton, W. B. Kendall, G. E. Kopec, S. S. (McCandless) Seneff, 1. Makhoul, L. R. Morris, A. V. Oppenheim, L. L. Pfeifer, M. R. Sambur, L. R. Rabiner, C. 1. Weinstein, and 1. Welch. We would also like to thank D. Cohen for his assistance in preparing the high-contrast com puter program listings. For the countless hours of careful typing, and willingness to work within our rather strict deadlines, we are indebted to Marilyn Berner. The excellent layout VII work and drafting by David Sangster for the large number of original figures in this book is also greatly appreciated. We would also like to thank Dr. Beatrice Oshika for her careful proofreading of the final manuscript and page proofs. A special thanks is given to Dr. David Broad who read the complete manuscript for both content and grammar. His many valuable suggestions have greatly improved the logic and readability of the book. Finally, the authors are indebted to Dr. June E. Shoup-Hummel, under whose guidance SCRL has provided a stimulating environment for studying the myriad problems that focus upon the basic nature of speech. Santa Barbara, California J. D. Markel January 1, 1976 A. H. Gray, Jr. VIn Table of Contents 1. Introduction. . . . . . . . . . 1 1.1 Basic Physical Principles . . . 1 1.2 Acoustical Waveform Examples 3 1.3 Speech Analysis and Synthesis Models. 5 1.4 The Linear Prediction Model 10 1.5 Organization ofBook 16 2. Formulations . . . . . 18 2.1 Historical Perspective 18 2.2 Maximum Likelihood 20 2.3 Minimum Variance . 23 2.4 Prony's Method ... 25 2.5 Correlation Matching 31 2.6 PARCOR (Partial Correlation) 32 2.6.1 Inner Products and an Orthogonality Principle. 35 2.6.2 The PARCOR Lattice Structure 38 3. Solutions and Properties. . . . . . 42 3.1 Introduction . . . . . . . . . 42 3.2 Vector Spaces and Inner Products 44 3.2.1 Filter or Polynomial Norms. 46 3.2.2 Properties ofInner Products 47 3.2.3 Orthogonality Relations 48 3.3 Solution Algorithms .. 50 3.3.1 Correlation Matrix . . 51 3.3.2 Initialization . . . . . 53 3.3.3 Gram-Schmidt Orthogonalization . 54 3.3.4 Levinson Recursion 55 3.3.5 UpdatingAm(z) . 56 3.3.6 A Test Example. 57 3.4 Matrix Forms. . . . 58 4. Acoustic Tube Modeling. 60 4.1 Introduction . . . . 60 4.2 Acoustic Tube Derivation . 6j 4.2.1 Single Section Derivation. 63 4.2.2 Continuity Conditions. . 65 IX 4.2.3 Boundary Conditions . . . . . . . . . . . . . . 68 4.3 Relationship between Acoustic Tube and Linear Prediction . 71 4.4 An Algorithm, Examples, and Evaluation 77 4.4.1 An Algorithm. . . . . . . 78 4.4.2 Examples . . . . . . . . 80 4.4.3 Evaluation ofthe Procedure. 82 4.5 Estimation of Lip Impedance . 84 4.5.1 Lip Impedance Derivation . 84 4.6 Further Topics . . . . . . . . 88 4.6.1 Losses in the Acoustic Tube Model . 88 4.6.2 Acoustic Tube Stability. 90 5. Speech Synthesis Structures 92 5.1 Introduction . . . . . 92 5.2 Stability . . . . . . . 93 5.2.1 Step-up Procedure. 94 5.2.2 Step-down Procedure 95 5.2.3 Polynomial Properties . 98 5.2.4 ABoundonIFm(z)l. . . 99 5.2.5 Necessary and Sufficient Stability Conditions 101 5.2.6 Application of Results . . 102 5.3 Recursive Parameter Evaluation . . . 103 5.3.1 Inner Product Properties. . . . 103 5.3.2 Equation Summary with Program 110 5.4 A General Synthesis Structure . . . 113 5.5 Specific Speech Synthesis Structures. 117 5.5.1 The Direct Form . . . . . . 118 5.5.2 Two-Multiplier Lattice Model. 118 5.5.3 Kelly-Lochbaum Model . 121 5.5.4 One-Multiplier Models. . 123 5.5.5 Normalized Filter Model. 123 5.5.6 A Test Example . 126 6. Spectral Analysis . . . 129 6.1 Introduction . . . 129 6.2 Spectral Properties. 130 6.2.1 Zero Mean All-Pole Model 130 6.2.2 Gain Factor for Spectral Matching . 130 6.2.3 Limiting Spectral Match . . . . 132 6.2.4 Non-uniform Spectral Weighting 134 6.2.5 Minimax Spectral Matching. 136 6.3 A Spectral Flatness Model. . . . . . 139 6.3.1 A Spectral Flatness Measure . . 139 6.3.2 Spectral Flatness Transformations. 141 6.3.3 Numerical Evaluation . . 142 6.3.4 Experimental Results . . 143 6.3.5 Driving Function Models. 144 x 6.4 Selective Linear Prediction . . . . . . . . . . 146 6.4.1 Selective Linear Prediction (SLP) Algorithm . 148 6.4.2 A Selective Linear Prediction Program . . 149 6.4.3 Computational Considerations . . . . . 150 6.5 Considerations in Choice of Analysis Conditions 151 6.5.1 Choice of Method . 151 6.5.2 Sampling Rates ..... . 153 6.5.3 Order of Filter . . . . . . 154 6.5.4 Choice of Analysis Interval. 156 6.5.5 Windowing . . . . . . 157 6.5.6 Pre-emphasis. . . . . . 158 6.6 Spectral Evaluation Techniques 159 6.7 Pole Enhancement ..... . 161 7. Automatic Formant Trajectory Estimation . . 164 7.1 Introduction . . . . . . . . . . . . . 164 7.2 Formant Trajectory Estimation Procedure. 165 7.2.1 Introduction . . . . . 165 7.2.2 Raw Data from A (z) . . . . . . . . 167 7.2.3 Examples of Raw Data. . . . . . . 169 7.3 Comparison of Raw Data from Linear PredictIOn and Cepstral Smoothing. 172 7.4 Algorithm 1. . . . . . . . . . 176 7.5 Algorithm 2 . . . . . . . . . . 180 7.5.1 Definition of Anchor Points. 181 7.5.2 Processing of Each Voiced Segment. 181 7.5.3 Final Smoothing . . . . 183 7.5.4 Results and Discussion. . . . . . 184 7.6 Formant Estimation Accuracy . . . . . 185 7.6.1 An Example of Synthetic Speech Analysis 185 7.6.2 An Example of Real Speech Analysis 187 7.6.3 Influence of Voice Periodicity . . . . . 188 8. Fundamental Frequency Estimation. . 190 8.1 Introduction . . . . . . . . . . 190 8.2 Preprocessing by Spectral Flattening 191 8.2.1 Analysis of Voiced Speech with Spectral Regularity 191 8.2.2 Analysis of Voiced Speech with Spectral Irregularities 193 8.2.3 The STREAK Algorithm . 197 8.3 Correlation Techniques. . . . . . . . 199 8.3.1 Autocorrelation Analysis. . . . . 200 8.3.2 Modified Autocorrelation Analysis. 201 8.3.3 Filtered Error Signal Autocorrelation Analysis 203.J 8.3.4 Practical Considerations. 206 8.3.5 The SIFT Algorithm. . . . . . . . . . . . 206 XI 9. Computational Considerations in Analysis 212 9.1 Introduction . . . . . . . . . . 212 9.2 Ill-Conditioning. . . . . . . . . 213 9.2.1 A Measure of Ill-Conditioning . 214 9.2.2 Pre-emphasis of Speech Data . 216 9.2.3 Premtering before Sampling. . 216 9.3 Implementing Linear Prediction Analysis 217 9.3.1 Autocorrelation Method . . 217 9.3.2 Covariance Method . . . . . 219 9.3.3 Computational Comparison . 220 9.4 Finite Word Length Considerations. 222 9.4.1 Finite Word Length Coefficient Computation 223 9.4.2 Finite Word Length Solution of Equations . . 224 9.4.3 Overall Finite Word Length Implementation. 225 10. Vocoders . . . . 227 10.1 Introduction. 227 10.2 Techniques . 229 10.2.1 Coefficient Transformations 229 10.2.2 Encoding and Decoding. . 233 10.2.3 Variable Frame Rate Transmission 235 10.2.4 Excitation and Synthesis Gain Matching . 239 10.2.5 A Linear Prediction Synthesizer Program. 242 10.3 Low Bit Rate Pitch Excited Vocoders ..... 245 10.3.1 Maximum Likelihood and PARCOR Vocoders 246 10.3.2 Autocorrelation Method Vocoders 249 10.3.3 Covariance Method Vocoders 255 10.4 Base-Band Excited Vocoders 260 11. FurtherTopics. . . . . . .. 263 11.1 Speaker Identification and Verification . 263 11.2 Isolated Word Recognition. . . . . . 265 11.3 Acoustical Detection of Laryngeal Pathology 267 11.4 Pole-Zero Estimation . . . . . 271 11.5 Summary and Future Directions 275 References . . 278 Subject Index . 285 XII 1. Introduction Many different models have been postulated for quantitatively describing certain factors involved in the speech process. It can be stated with certainty that no single model has been developed which can account for all of the observed characteristics in human speech (nor would one probably desire such a model because of its inevitable complexity). A basic criterion of modeling is to find mathematical rela tions which can be used to represent a limited physical situation with a minimum of complexity and a maximum of accuracy. One of the most successful models of acoustical speech behavior is the linear speech production model develop ed by Fant [1960]. This model will be referred to throughout as the speech produc tion model. In recent years the mathematical technique of linear prediction has been applied to the problem of modeling speech behavior. The linear prediction model can be related to the speech production model, with the significant feature that the parameters of the speech production model are easily obtained using linear mathematics. In this chapter, the linear prediction model is developed and the relationship to the speech production model is shown. A starting point which can be used in developing the speech production model is speech physiology. Speech physiology is the springboard for many different areas which are relevant to a better understanding of speech. The discussions here, however, will consider only briefly the physical principles of speech. The main focus in this book will be the acoustical properties of speech. Detailed discussions of both the physiological and acoustical characteristics of speech are contained in Fant [1960J and Flana gim [1972J, and in Peterson and Shoup [1966a, 1966bJ. 1.1 Basic Physical Principles The acoustical speech waveform is an acoustic pressure wave which originates from voluntary physiological movements of the structures shown in Fig. 1.1. Air is expelled from the lungs into the trachea and then forced between the vocal folds. During the generation of voiced sounds such as fi/* in eve, the air pushed * The symbol N is used to denote the phoneme, a basic linguistic unit. Definitions of the phonemes of the General American dialect are given in Flanagan [1972, pp. 15-22]' 1

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.