Table Of Content

AUDIO SIGNAL PROCESSING FOR NEXT- GENERATION MULTIMEDIA COMMUNI- CATION SYSTEMS This page intentionally left blank AUDIO SIGNAL PROCESSING FOR NEXT- GENERATION MULTIMEDIA COMMUNI- CATION SYSTEMS Edited by YITENG (ARDEN) HUANG Bell Laboratories, Lucent Technologies JACOB BENESTY Université du Québec, INRS-EMT KLUWER ACADEMIC PUBLISHERS NEW YORK, BOSTON, DORDRECHT, LONDON, MOSCOW eBook ISBN: 1-4020-7769-6 Print ISBN: 1-4020-7768-8 ©2004 Kluwer Academic Publishers New York, Boston, Dordrecht, London, Moscow Print ©2004 Kluwer Academic Publishers Boston All rights reserved No part of this eBook may be reproduced or transmitted in any form or by any means, electronic, mechanical, recording, or otherwise, without written consent from the Publisher Created in the United States of America Visit Kluwer Online at: http://kluweronline.com and Kluwer's eBookstore at: http://ebooks.kluweronline.com Contents Preface xi Contributing Authors xiii 1 Introduction 1 Yiteng (Arden) Huang Jacob Benesty 1. Multimedia Communications 1 2. Challenges and Opportunities 3 3. Organization of the Book 4 Part I Speech Acquisition and Enhancement 2 Differential Microphone Arrays 11 Gary W. Elko 1. Introduction 11 2. Differential Microphone Arrays 12 3. Array Directional Gain 22 4. Optimal Arrays for Isotropic Fields 24 4.1 Maximum Directional Gain 24 4.2 Maximum Directivity Index for Differential Microphones 28 4.3 Maximum Front-to-Back Ratio 32 4.4 Minimum Peak Directional Response 37 4.5 Beam width 39 5. Design Examples 39 5.1 First-Order Designs 40 5.2 Second-Order Designs 44 5.3 Third-Order Designs 52 5.4 Higher-Order designs 58 6. Sensitivity to Microphone Mismatch and Noise 60 7. Conclusions 64 vi Audio Signal Processing 3 Spherical Microphone Arrays for 3D Sound Recording 67 Jens Meyer Gary W. Elko 1. Introduction 67 2. Fundamental Concept 69 3. The Eigenbeamformer 71 3.1 Discrete Orthonormality 73 3.2 The Eigenbeams 73 3.3 The Modal Coefficients 74 4. Modal-Beamformer 76 4.1 Combining Unit 76 4.2 Steering Unit 76 5. Robustness Measure 77 6. Beampattern Design 79 6.1 Arbitrary Beampattern Design 79 6.2 Optimum Beampattern Design 79 7. Measurements 83 8. Summary 86 9. Appendix A 89 4 Subband Noise Reduction Methods for Speech Enhancement 91 Eric J. Diethorn 1. Introduction 91 2. Wiener Filtering 94 3. Speech Enhancement by Short-Time Spectral Modification 95 3.1 Short-Time Fourier Analysis and Synthesis 95 3.2 Short-Time Wiener Filter 96 3.3 Power Subtraction 97 3.4 Magnitude Subtraction 98 3.5 Parametric Wiener Filtering 99 3.6 Review and Discussion 100 4. Averaging Techniques for Envelope Estimation 104 4.1 Moving Average 105 4.2 Single-Pole Recursion 105 4.3 Two-Sided Single-Pole Recursion 106 4.4 Nonlinear Data Processing 107 5. Example Implementation 107 5.1 Subband Filter Bank Architecture 108 5.2 A-Posteriori-SNR Voice Activity Detector 109 5.3 Example 111 6. Conclusion 111 Part II Acoustic Echo Cancellation 5 Adaptive Algorithms for MIMO Acoustic Echo Cancellation 119 Jacob Benesty Tomas Gänsler Yiteng (Arden) Huang Markus Rupp 1. Introduction 120 2. Normal Equations and Identification of a MIMO System 121 2.1 Normal Equations 121 Contents vii 2.2 The Nonuniqueness Problem 124 2.3 The Impulse Response Tail Effect 125 2.4 Some Different Solutions for Decorrelation 126 3. The Classical and Factorized Multichannel RLS 128 4. The Multichannel Fast RLS 130 5. The Multichannel LMS Algorithm 132 5.1 Classical Derivation 132 5.2 Improved Version 133 6. The Multichannel APA 134 6.1 The Straightforward Multichannel APA 134 6.2 The Improved Two-Channel APA 135 6.3 The Improved Multichannel APA 136 7. The Multichannel Exponentiated Gradient Algorithm 137 8. The Multichannel Frequency-domain Adaptive Algorithm 142 9. Conclusions 145 6 Double-Talk Detectors for Acoustic Echo Cancelers 149 Tomas Gänsler Jacob Benesty 1. Introduction 149 2. Basics of AEC and DTD 152 2.1 AEC Notations 152 2.2 The Generic DTD 152 2.3 A Suggestion to Performance Evaluation of DTDs 153 3. Double-Talk Detection Algorithms 154 3.1 The Geigel Algorithm 154 3.2 The Cross-Correlation Method 154 3.3 The Normalized Cross-Correlation Method 155 3.4 The Coherence Method 157 3.5 The Normalized Cross-correlation Matrix 159 3.6 The Two-Path Model 161 3.7 DTD Combinations with Robust Statistics 163 4. Comparison of DTDs by Means of the ROC 165 5. Discussion 167 7 The WinEC: A Real-Time Hands-Free Stereo Communication System 171 Tomas Gänsler Volker Fischer Eric J. Diethorn Jacob Benesty 1. Introduction 172 1.1 Signal model 173 2. System Description 173 2.1 The Audio Module 173 2.2 The Network Module 176 2.3 The Echo Canceler Module 177 3. Algorithms of the Echo Canceler Module 177 3.1 Adaptive Filter Algorithm 178 4. Residual Echo and Noise Suppression 181 4.1 Masking Threshold for Residual Echo in Noise 183 4.2 Analysis of Echo Suppression Requirements 184 4.3 Noise and Residual Echo Suppression 186 5. Simulations 186 6. Real-Time Tests with Different Modes of Operation 189 viii Audio Signal Processing 6.1 Point-to-Point Communication 189 6.2 Multi-Point Communication 189 6.3 Transatlantic Teleconference in Stereo 190 7. Discussion 191 Part III Sound Source Tracking and Separation 8 Time Delay Estimation 197 Jingdong Chen Yiteng (Arden) Huang Jacob Benesty 1. Introduction 198 2. Signal Models 200 2.1 Ideal Propagation Model 200 2.2 Multipath Model 201 2.3 Reverberant Model 202 3. Generalized Cross-Correlation Method 202 4. The Multichannel Cross-Correlation Algorithm 204 4.1 Spatial Prediction Technique 204 4.2 Time Delay Estimation Using Spatial Prediction 207 4.3 Other Information from the Spatial Correlation Matrix 208 5. Adaptive Eigenvalue Decomposition Algorithm 211 6. Adaptive Multichannel Time Delay Estimation 213 6.1 Principle 213 6.2 Time-Domain Multichannel LMS Approach 214 6.3 Frequency-Domain Adaptive Algorithms 215 7. Experiments 219 7.1 Experimental Setup 219 7.2 Performance Measure 220 7.3 Experimental Results 221 8. Conclusions 223 9 Source Localization 229 Yiteng (Arden) Huang Jacob Benesty Gary W. Elko 1. Introduction 230 2. Source Localization Problem 232 3. Measurement Model and Cramèr-Rao Lower Bound for Source Lo- calization 234 4. Maximum Likelihood Estimator 235 5. Least Squares Estimators 236 5.1 The Least Squares Error Criteria 237 5.2 Spherical Intersection (SX) Estimator 239 5.3 Spherical Interpolation (SI) Estimator 239 5.4 Linear-Correction Least Squares Estimator 240 6. Example System Implementation 246 7. Source Localization Examples 247 8. Conclusions 249 10 Blind Source Separation for Convolutive Mixtures: A Unified Treatment 255 Herbert Buchner Robert Aichner Walter Kellermann Contents ix 1. Introduction 256 2. Generic Block Time-Domain BSS Algorithm 259 2.1 Matrix Notation for Convolutive Mixtures 259 2.2 Cost Function and Algorithm Derivation 261 2.3 Equivariance Property and Natural Gradient 263 2.4 Special Cases and Links to Known Time-Domain Algorithms 265 3. Generic Frequency-Domain BSS Algorithm 271 3.1 General Frequency-Domain Formulation 271 3.2 Natural Gradient in the Frequency Domain 276 3.3 Special Cases and Links to Known Frequency-Domain Al- gorithms 277 4. Weighting Function 284 4.1 Off-line Implementation 285 4.2 On-line Implementation 285 4.3 Block-on-Line Implementation 286 5. Experiments and Results 286 6. Conclusions 289 Part IV Audio Coding and Realistic Sound Stage Reproduction 11 Audio Coding 297 Gerald Schuler 1. Introduction 297 2. Psycho-Acoustics 298 3. Filter Banks 300 3.1 Polyphase Formulation 301 3.2 Modulated Filter Banks 302 3.3 Block Switching 308 4. Current and Basic Coder Structures 309 5. Stereo Coding 311 6. Low Delay Audio Coding 314 7. Conclusions 321 12 Sound Field Synthesis 323 Sascha Spors Heinz Teutsch Achim Kuntz Rudolf Rabenstein 1. Introduction 324 2. Rendering of Sound Fields with Wave Field Synthesis 325 2.1 Physical Foundation of Wave Field Synthesis 325 2.2 Wave Field Synthesis Based Sound Reproduction 327 3. Model-based and Data-Based Rendering 329 3.1 Data-Based Rendering 329 3.2 Model-Based Rendering 330 3.3 Hybrid Approach 331 4. Wave Field Analysis 331 5. Loudspeaker and Listening Room Compensation 333 5.1 Listening Room Compensation 334 5.2 Loudspeaker Compensation 337 6. Description of a Sound Field Transmission System 339