SPEECH CODING ALGORITHMS SPEECH CODING ALGORITHMS Foundation and Evolution of Standardized Coders WAI C. CHU Mobile Media Laboratory DoCoMo USA Labs San Jose, California A JOHN WILEY & SONS, INC., PUBLICATION Copyright#2003byJohnWiley&Sons,Inc.Allrightsreserved. PublishedbyJohnWiley&Sons,Inc.,Hoboken,NewJersey. PublishedsimultaneouslyinCanada. Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmittedinanyformor byanymeans,electronic,mechanical,photocopying,recording,scanning,orotherwise,exceptas permittedunderSection107or108ofthe1976UnitedStatesCopyrightAct,withouteithertheprior writtenpermissionofthePublisher,orauthorizationthroughpaymentoftheappropriateper-copyfee totheCopyrightClearanceCenter,Inc.,222RosewoodDrive,Danvers,MA01923,978-750-8400, fax978-750-4470,oronthewebatwww.copyright.com.RequeststothePublisherforpermissionshould beaddressedtothePermissionsDepartment,JohnWiley&Sons,Inc.,111RiverStreet,Hoboken, NJ07030,(201)748-6011,fax(201)748-6008,e-mail:[email protected]. LimitofLiability/DisclaimerofWarranty:Whilethepublisherandauthorhaveusedtheirbesteffortsin preparingthisbook,theymakenorepresentationsorwarrantieswithrespecttotheaccuracyor completenessofthecontentsofthisbookandspecificallydisclaimanyimpliedwarrantiesof merchantabilityorfitnessforaparticularpurpose.Nowarrantymaybecreatedorextendedbysales representativesorwrittensalesmaterials.Theadviceandstrategiescontainedhereinmaynotbesuitable foryoursituation.Youshouldconsultwithaprofessionalwhereappropriate.Neitherthepublishernor authorshallbeliableforanylossofprofitoranyothercommercialdamages,includingbutnotlimitedto special,incidental,consequential,orotherdamages. ForgeneralinformationonourotherproductsandservicespleasecontactourCustomerCareDepartment withintheU.S.at877-762-2974,outsidetheU.S.at317-572-3993orfax317-572-4002. Wileyalsopublishesitsbooksinavarietyofelectronicformats.Somecontentthatappearsinprint, however,maynotbeavailableinelectronicformat. LibraryofCongressCataloging-in-PublicationData: Chu,WaiC.— Speechcodingalgorithms:Foundationandevolutionofstandardizedcoders ISBN0-471-37312-5 PrintedintheUnitedStatesofAmerica 10 9 8 7 6 5 4 3 2 1 Intelligence is the fruit of industriousness Accretion of knowledge creates genii AChinese proverb CONTENTS PREFACE xiii ACRONYMS xix NOTATION xxiii 1 INTRODUCTION 1 1.1 Overview of Speech Coding / 2 1.2 Classification of Speech Coders / 8 1.3 Speech Production and Modeling / 11 1.4 Some Properties of the Human Auditory System / 18 1.5 Speech Coding Standards / 22 1.6 About Algorithms / 26 1.7 Summary and References / 31 2 SIGNAL PROCESSING TECHNIQUES 33 2.1 Pitch Period Estimation / 33 2.2 All-Pole and All-Zero Filters / 45 2.3 Convolution / 52 2.4 Summary and References / 57 Exercises / 57 vii viii CONTENTS 3 STOCHASTIC PROCESSES AND MODELS 61 3.1 Power Spectral Density / 62 3.2 Periodogram / 67 3.3 Autoregressive Model / 69 3.4 Autocorrelation Estimation / 73 3.5 Other Signal Models / 85 3.6 Summary and References / 86 Exercises / 87 4 LINEAR PREDICTION 91 4.1 The Problem of Linear Prediction / 92 4.2 Linear Prediction Analysis of Nonstationary Signals / 96 4.3 Examples of Linear Prediction Analysis of Speech / 101 4.4 The Levinson–Durbin Algorithm / 107 4.5 The Leroux–Gueguen Algorithm / 114 4.6 Long-Term Linear Prediction / 120 4.7 Synthesis Filters / 127 4.8 Practical Implementation / 131 4.9 Moving Average Prediction / 137 4.10 Summary and References / 138 Exercises / 139 5 SCALAR QUANTIZATION 143 5.1 Introduction / 143 5.2 Uniform Quantizer / 147 5.3 Optimal Quantizer / 149 5.4 Quantizer Design Algorithms / 151 5.5 Algorithmic Implementation / 155 5.6 Summary and References / 158 Exercises / 158 6 PULSE CODE MODULATION AND ITS VARIANTS 161 6.1 Uniform Quantization / 161 6.2 Nonuniform Quantization / 166 6.3 Differential Pulse Code Modulation / 172 6.4 Adaptive Schemes / 175 6.5 Summary and References / 180 Exercises / 181 CONTENTS ix 7 VECTOR QUANTIZATION 184 7.1 Introduction / 185 7.2 Optimal Quantizer / 188 7.3 Quantizer Design Algorithms / 189 7.4 Multistage VQ / 194 7.5 Predictive VQ / 216 7.6 Other Structured Schemes / 219 7.7 Summary and References / 221 Exercises / 222 8 SCALAR QUANTIZATION OF LINEAR PREDICTION COEFFICIENT 227 8.1 Spectral Distortion / 227 8.2 Quantization Based on Reflection Coefficient and Log Area Ratio / 232 8.3 Line Spectral Frequency / 239 8.4 Quantization Based on Line Spectral Frequency / 252 8.5 Interpolation of LPC / 256 8.6 Summary and References / 258 Exercises / 260 9 LINEAR PREDICTION CODING 263 9.1 Speech Production Model / 264 9.2 Structure of the Algorithm / 268 9.3 Voicing Detector / 271 9.4 The FS1015 LPC Coder / 275 9.5 Limitations of the LPC Model / 277 9.6 Summary and References / 280 Exercises / 281 10 REGULAR-PULSE EXCITATION CODERS 285 10.1 Multipulse Excitation Model / 286 10.2 Regular-Pulse-Excited–Long-Term Prediction / 289 10.3 Summary and References / 295 Exercises / 296 11 CODE-EXCITED LINEAR PREDICTION 299 11.1 The CELP Speech Production Model / 300 x CONTENTS 11.2 The Principle of Analysis-by-Synthesis / 301 11.3 Encoding and Decoding / 302 11.4 Excitation Codebook Search / 308 11.5 Postfilter / 317 11.6 Summary and References / 325 Exercises / 326 12 THE FEDERAL STANDARD VERSION OF CELP 330 12.1 Improving the Long-Term Predictor / 331 12.2 The Concept of the Adaptive Codebook / 333 12.3 Incorporation of the Adaptive Codebook to the CELP Framework / 336 12.4 Stochastic Codebook Structure / 338 12.5 Adaptive Codebook Search / 341 12.6 Stochastic Codebook Search / 344 12.7 Encoder and Decoder / 346 12.8 Summary and References / 349 Exercises / 350 13 VECTOR SUM EXCITED LINEAR PREDICTION 353 13.1 The Core Encoding Structure / 354 13.2 Search Strategies for Excitation Codebooks / 356 13.3 Excitation Codebook Searches / 357 13.4 Gain Related Procedures / 362 13.5 Encoder and Decoder / 366 13.6 Summary and References / 368 Exercises / 369 14 LOW-DELAY CELP 372 14.1 Strategies to Achieve Low Delay / 373 14.2 Basic Operational Principles / 375 14.3 Linear Prediction Analysis / 377 14.4 Excitation Codebook Search / 380 14.5 Backward Gain Adaptation / 385 14.6 Encoder and Decoder / 389 14.7 Codebook Training / 391 14.8 Summary and References / 393 Exercises / 394 CONTENTS xi 15 VECTOR QUANTIZATION OF LINEAR PREDICTION COEFFICIENT 396 15.1 Correlation Among the LSFs / 396 15.2 Split VQ / 399 15.3 Multistage VQ / 403 15.4 Predictive VQ / 407 15.5 Summary and References / 418 Exercises / 419 16 ALGEBRAIC CELP 423 16.1 Algebraic Codebook Structure / 424 16.2 Adaptive Codebook / 425 16.3 Encoding and Decoding / 433 16.4 Algebraic Codebook Search / 437 16.5 Gain Quantization Using Conjugate VQ / 443 16.6 Other ACELP Standards / 446 16.7 Summary and References / 451 Exercises / 451 17 MIXED EXCITATION LINEAR PREDICTION 454 17.1 The MELP Speech Production Model / 455 17.2 Fourier Magnitudes / 456 17.3 Shaping Filters / 464 17.4 Pitch Period and Voicing Strength Estimation / 466 17.5 Encoder Operations / 474 17.6 Decoder Operations / 477 17.7 Summary and References / 481 Exercises / 482 18 SOURCE-CONTROLLED VARIABLE BIT-RATE CELP 486 18.1 Adaptive Rate Decision / 487 18.2 LPAnalysis and LSF-Related Operations / 494 18.3 Decoding and Encoding / 496 18.4 Summary and References / 498 Exercises / 499 19 SPEECH QUALITY ASSESSMENT 501 19.1 The Scope of Quality and Measuring Conditions / 501 xii CONTENTS 19.2 Objective Quality Measurements for Waveform Coders / 502 19.3 Subjective Quality Measures / 504 19.4 Improvements on Objective Quality Measures / 505 APPENDIX A MINIMUM-PHASE PROPERTY OF THE FORWARD PREDICTION-ERROR FILTER 507 APPENDIX B SOME PROPERTIES OF LINE SPECTRAL FREQUENCY 514 APPENDIX C RESEARCH DIRECTIONS IN SPEECH CODING 518 APPENDIX D LINEAR COMBINER FOR PATTERN CLASSIFICATION 522 APPENDIX E CELP: OPTIMAL LONG-TERM PREDICTOR TO MINIMIZE THE WEIGHTED DIFFERENCE 531 APPENDIX F REVIEW OF LINEAR ALGEBRA: ORTHOGONALITY, BASIS, LINEAR INDEPENDENCE, AND THE GRAM–SCHMIDT ALGORITHM 537 BIBLIOGRAPHY 542 INDEX 553