ebook img

Speech Time-Frequency Representations PDF

168 Pages·1988·12.41 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Speech Time-Frequency Representations

SPEECH TIME-FREQUENCY REPRESENTATIONS THE KLUWER INTERNATIONAL SERIES IN ENGINEERING AND COMPUTER SCIENCE VLSI, COMPUTER ARCHITECTURE AND DIGITAL SIGNAL PROCESSING Consulting Editor Jonathan Allen Other books in the series: Logic Minimization Algorithms for VLSI Synthesis. R.K. Brayton, G.D. Hachtel, C.T. McMullen, and A.L. Sangiovanni-Vincentelli. ISBN 0-89838~164-9. Adaptive Filters: Structures, Algorithms, and Applications. M.L. Honig and D.G. Messerschmitt. ISBN 0-89838-163-0. Introduction to VLSI Silicon Devices: Physics, Technology and Characterization. B. El-Kareh and R.J. Bombard. ISBN 0-89838-210-6. Latchup in CMOS Technology: The Problem and Its Cure. R.R. Troutman. ISBN 0-89838-215-7. Digital CMOS Circuit Design. M. Annaratone. ISBN 0-89838-224-6. The Bounding Approach to VLSI Circuit Simulation. C.A. Zukowski. ISBN 0-89838-176-2. Multi-Level Simulation for VLSI Design. D.D. Hill and D.R. Coelho. ISBN 0-89838-184-3. Relaxation Techniques for the Simulation of VLSI Circuits. J. White and A. Sangiovanni-Vincentelli. ISBN 0-89838-186-X. VLSI CAD Tools and Applications. W. Fichtner and M. Morf, editors. ISBN 0-89838-193-2. A VLSI Architecture for Concurrent Data Structures. W.J. Dally. ISBN 0-89838-235-1. Yield Simulation for Integrated Circuits. D.M.H. Walker. ISBN 0-89838-244-0. VLSI Specification, Verification and Synthesis. G. Birtwistle and P.A. Subrahmanyam. ISBN 0-89838-246-7. Fundamentals of Computer-Aided Circuit Simulation. W.J. McCalla. ISBN 0-89838-248-3. Serial Data Computation. S.G. Smith and P.B. Denyer. ISBN 0-89838-253-X. Phonological Parsing in Speech Recognition. K.W. Church. ISBN 0-89838-250-5. Simulated Annealing for VLSI Design. D.F. Wong, H.W. Leong, and C.L. Liu. ISBN 0-89838-256-4. Polycrystalline Silicon for Integrated Circuit Applications. T. Kamins. ISBN 0-89838-259-9. FET Modeling for Circuit Simulation. D. Divekar. ISBN 0-89838-264-5. VLSI Placement and Global Routing Using Simulated Annealing. C. Sechen. ISBN 0-89838-281-5. Adaptive Filters and Equalisers. B. Mulgrew, C.F.N. Cowan. ISBN 0-89838-285-8. Computer-Aided Design and VLSI Device Development, Second Edition. K.M. Cham, S-Y. Oh, J.L. Moll, K. Lee, P. Vande Voorde, D. Chin. ISBN: 0-89838-277-7. Automatic Speech Recognition. K-F. Lee. ISBN 0-89838-296-3. A Systolic Array Optimizing Compiler. M.S. Lam. ISBN: 0-89838-300-5. Algorithms and Techniques for VLSI Layout Synthesis. D. Hill, D. Shugard, J. Fishburn, K. Keutzer. ISBN: 0-89838-301-3. SPEECH TIME-FREQUENCY REPRESENTATIONS by Michael D. Riley AT&T Bell Laboratories ~. " KLUWER ACADEMIC PUBLISHERS BOSTON/DORDRECHT/LONDON Distributors for North America: Kluwer Academic Publishers 101 Philip Drive Assinippi Park Norwell, Massachusetts 02061 USA Distributors for the UK and Ireland: Kluwer Academic Publishers Falcon House, Queen Square Lancaster LAI IRN, UNITED KINGDOM Distributors for all other countries: Kluwer Academic Publishers Group Distribution Centre Post Office Box 322 3300 AH Dordrecht, THE NETHERLANDS Library of Congress Cataloging-in-Publication Data Riley, Michael D. (Michael Dennis), 1955- Speech time-frequency representations / by Michael D. Riley. p. cm. --(The Kluwer international series in engineering and computer science; 63. VLSI, computer architecture, and digital signal processing) Bibliography: p. Includes index. ISBN-13: 978-1-4612-8417-8 e-ISBN-13: 978-1-4613-1079-2 001: 10.1007/978-1-4613-1079-2 1. Automatic speech recognition. I. TitIe. II. Series: Kluwer international series in engineering and computer science; SECS 63. III. Series: Kluwer international series in engineering and computer science. VLSI, computer architecture, and digital signal processing. TK7882.S65R55 1989 006.4 '54--dcI9 88-7585 CIP Copyright © 1989 by Kluwer Academic Publishers Softcover reprint of the hardcover 1st edition 1989 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, mechanical, photocopying, recording, or other wise, without the prior written permission of the publisher, Kluwer Academic Publishers, 101 Philip Drive, Assinippi Park, Norwell, Massachusetts 02061. Contents 1 INTRODUCTION 1 2 THE TIME-FREQUENCY ENERGY REPRESENTATION 7 2.1. The stationary case 7 2.2. The quasi-stationary case 11 2.3. Non-stationarity 11 2.4. Joint time-frequency representations 18 2.5. Design criteria for time-frequency representations 22 2.6. Relations among the design criteria 30 2.7. Satisfying the design criteria 39 2.8. Directional time-frequency transforms 40 2.9. A speech example 46 3 TIME-FREQUENCY FILTERING 53 3.1. The stationary case 53 3.2. Non-stationary vocal tract 60 3.3. Time-frequency filtering 66 3.4. The stationary case - re-examined 68 3.5. Linearly varying modulation frequency 73 3.6. The quasi-stationary case 78 3.7. Smoothly varying modulation frequency 80 3.8. The vocal tract transfer function 81 3.9. The transmission channel 83 3.10. The excitation 84 4 THE SCHEMATIC SPECTROGRAM 87 4.l. Rationale 87 4.2. Spectral Peaks 91 4.3. Time-frequency ridges - non-directional kernel 94 4.4. Time-frequency ridges - directional kernel 99 4.5. Signal detection and ridge identification 105 4.6. Continuity and grouping 108 4.7. A perspective 117 5 A CATALOG OF EXAMPLES 121 5.l. Some general examples 122 5.2. Liquids and glides 128 5.3. Nasalized vowels 128 5.4. Consonant-vowel transitions 137 5.5. Female speech 142 5.6. Transmission channel effects 143 REFERENCES 149 INDEX 157 Figures 1 INTRODUCTION 1.1. Steps in the initial auditory processing. 4 2 THE TIME-FREQUENCY ENERGY REPRESENTATION 2.1. Short-time spectrum of a steady-state Iii. 9 2.2. Smoothed short-time spectra. 9 2.3. Short-time spectra of linear chirps. 13 2.4. Short-time spectra of /w /'s. 15 2.5. Wide band spectrograms of /w /'s. 16 2.6. Spectrograms of rapid formant motion. 17 2.7. Wigner distribution and spectrogram. 21 2.8. Wigner distribution and spectrogram of cos wot. 23 2.9. Concentration ellipses for transform kernels. 28 2.10. Concentration ellipses for complementary kernels. 42 2.11. Directional transforms for a linear chirp. 42 2.12. Spectrograms of /wioi/ with different window sizes. 47 2.13. Wigner distribution of /wioi/. 49 2.14. Time-frequency autocorrelation function of /wioi/. 49 Iwioi/. 2.15. Gaussian transform of 50 lwioi/. 2.16. Directional transforms of 52 3 TIME-FREQUENCY FILTERING 3.1. Recovering the transfer function by filtering. 57 3.2. Estimating 'aliased' transfer function. 61 3.3. T-F autocorrelation function of an impulse train. 70 3.4. T-F autocorrelation function of LTI filter output. 70 3.5. Windowing recovers transfer function. 72 3.6. Shearing the time-frequency autocorrelation function. 75 3.7. T-F autocorrelation function for FM filter. 76 3.8. T-F autocorrelation function of FM filter output. 77 3.9. Windowing recovers transfer function. 79 4 THE SCHEMATIC SPECTROGRAM 4.1. Problems with pole-fitting approach. 90 4.2. Peaks in spectral cross-sections of the energy surface. 93 4.3. Gradient and curvature vectors near rising F2. 95 4.4. 2-D ridge computation applied to the energy surface. 98 4.5. Two conditions for ridge detection. 100 4.6. Tuning curves for gaussian transform kernels. 102 4.7. Transform kernel ¢>(t, J) = -;P¢>(t,J). 103 4.8. Tuning curves for kernels of the form in Figure 4.7. 104 4.9. Ridge tops computed with directional transform. 106 4.10. Hysteresis thresholding. 110 4.11. Two contours competing for labelling as F2. 111 4.12. Turning an Iii into an lui by filtering. 113 4.13. Spectrogram of Iwi/. 114 4.14. Merging formants. 116 t15. Contour junctions located by simple proximity rules. 118 5 A CATALOG OF EXAMPLES 5.1. "May we all learn a yellow lion roar." 124 5.2. "Are we winning yet?" 125 5.3. "We were away a year ago." 126 5.4. "Why am I eager?" 127 5.5. /w /'s at various speech rates. 129 5.6. Syllable initial /ju/'s at various speech rates. 131 5.7. Syllable initial /1/'s. 133 I's 5.8. /r in various vowel contexts. 134 5.9. Nasalized vowels. 136 5.10. Syllable initial /bl's. 138 5.11. Syllable initial /d/'s. 139 5.12. Syllable initial /g/'s. 140 5.13. Rapid formant transistions. 141 5.14. /uiuiui/ uttered by adult female. 145 5.15. Transmission channels. 146 5.16. Broadband channel's effects. 147 5.17. Narrowband channel's effects. 147 5.18. Stopband channel's effects. 148 Acknowledgments This book is based on my Ph.D thesis in E.E.C.S. from M.I.T., sub mitted in May 1987. I wish to thank Tom Knight for having super vised this work. I also thank my readers Tomaso Poggio, Victor Zue, and especially Mark Liberman. I am most grateful to Patrick Win ston for his support at the M.I.T. Artificial Intelligence Laboratory and Osamu Fujimura, Max Matthews and James Flanagan for their support at Bell Laboratories.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.