ebook img

Computer Speech: Recognition, Compression, Synthesis PDF

338 Pages·1999·22.962 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Computer Speech: Recognition, Compression, Synthesis

Springer Series in Information Sciences 35 Springer-Verlag Berlin Heidelberg GmbH Springer Series in Information Sciences Editors: Thomas S. Huang Teuvo Kohonen Manfred R. Schroeder 30 Self-Organizing Maps By T. Kohonen 2nd Edition 31 Music and Schema Theory Cognitive Foundations of Systematic Musicology By M. Leman 32 The Maximum Entropy Method By N. Wu 33 A Few Steps Towards 3D Active Vision By T. Vieville 34 Calibration and Orientation of Cameras in Computer Vision Editors: A. Grün and T. S. Huang 35 Computer Speech Recognition, Compression, Synthesis By M. R. Schroeder Volumes 1-29 are listed at the end of the book. Manfred R. Schroeder Computer Speech Recognition, Compression, Synthesis With Introductions to Hearing and Signal Analysis and a Glossary of Speech and Computer Terms With 78 Figures Springer Professor Dr. Manfred R. Schroeder Drittes Physikalisches Institut Universität Göttingen Bürgerstrasse 42-44, 0-37073 Göttingen, Germany Series Editors: Professor Thomas S. Huang Department of Electrical Engineering and Coordinated Science Laboratory , University of Illinois, Urbana, IL 61801, USA Professor Teuvo Kohonen Helsinki University of Technology, Neural Networks Research Centre, Rakentajanaukio 2 C, FIN-02150 Espoo, Finland Professor Dr. Manfred R. Schroeder Drittes Physikalisches Institut, Universität Göttingen, Bürgerstrasse 42-44, 0-37073 Göttingen, Germany Excerpts from Music Perception, IEEE Proceedings, and Cartoons from The New Yorker reprinted with permission. ISSN OnO-678X ISBN 978-3-662-03863-5 ISBN 978-3-662-03861-1 (eBook) DOI 10.1007/978-3-662-03861-1 Library of Congress Cataloging-in-Publication Data applied for. Die Deutsche Bibliothek - CIP-Einheitsaufnahme Sehroeder, Maufred R.: Computer speech: recognition, compression, synthesis; with introductions to hearing and signal analysis and a glossary of speech and computer terms / Manfred R. Schroeder. - Berlin; Heidelberg; New York; Barcelona; Hong Kong; London; Milan; Paris; Singapore; Tokyo: Springer, 1999 (Springer series in information sciences; 35) This work is subject to copyright. All rights are reserved, wh ether the whole or part of the material is concerned. specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. © Springer-Verlag Berlin Heidelberg 1999 Originally published by Springer-Verlag Berlin Heidelberg New York in 1999. Softcover reprint of the hardcover 1s t edition 1999 The use 01' general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Data conversion by Satztechnik Katharina Steingraeber, Heidelberg Cover design: design & productilm GmbH, Heidelberg SPIN: 10676879 56/3144 -5 432 I 0 -Printed on acid-free paper To Anny Marion, Julian, Alexander Marion, Uwe Julia, Lilly, N ora, Nicola Preface World economies are increasingly driven by research, knowledge, information, and technology - with no small part being played by computer speech: • automatie speech recognition and speaker authentication, • speech compression for mobile phones, voice security and Internet "real audio," • speech synthesis and computer dialogue systems. According to the London Economist information technology represents "a change even more far-reaching than the harnessing of electrical power a cen tury aga ... Where once greater distance made communieations progressively more expensive and complicated, now distance is increasingly irrelevant." 1 Natural speech, predominant in human communication since prehistorie times, has acquired a brash new kin: synthetie or computer speech. While people continue to use speech liberally, many of us are increasingly exposed to speech manufactured and understood by computers, compressed or otherwise modified by digital signal processors or microchips. This evolution, driven by innumerable technieal advances, has been described in a profusion of scientific papers and research monographs aimed at the professional specialist. The present volume, by contrast, is aimed at a larger audience: all those generally curious about computer speech and all it portends for our future at home and in the workplace. I have therefore kept the technical details in the first chapters to a minimum and relegated any unavoidable "mathematics" to the end of the book. Considering the importance of hearing for speech compression, speech synthesis (and even recognition), I have included abrief overview of hearing, both monaural and binaural. I have also added a compact review of signal analysis because of its rele vance for speech processing. For the benefit of readers new to speech and computers I have added a glossary of terms from these fields. I have also augmented the numbered references by a list of books for general reading, selected journals, and major meetings on speech. 1 Frances Cairncross, The Economist, 13 September 1997. Quoted from The New York Review of Books, 26 March 1998 (p.29). But see also Thomas K. Landauer, The Trouble with Computers (MIT Press, Cambridge, MA 1995). VIII Preface Progress in any field does not arise in a vacuum but happens in a manifestly human environment. But many scientists, Carl Friedrich Gauss, the "Prince of Mathematicians" foremost among them, take great pride in obliterating in their publications any trace of how success was achieved. From this view, with all due respect for the great Gauss, I beg to differ. Of course, in mathematics, the "conjecture - lemma - theorem - proof" cycle will always remain the mainstay of progress. But I believe that much is to be gained - and not only for the nonspecialist - by putting scientific advances in a personal context. Some Personal Recollections This book has its genesis in a visit by a Bonn linguist to the University of Göttingen in Germany shortly before Christmas 1951. The noted speech sci entist had been invited to give a talk at the General Physics Colloquium, not on the usual nuts-and-bolts exploits in physics, but on Shannon's communi cation theory and its potential importance for human language, written or spoken. After the distinguished guest had safely departed, most of the physicists present professed "not to have understood a word." During the ensuing de part mental Christmas party the chairman, the reigning theoretical physi eist, asked me how I had liked the lecture. When ladmitted that (although far from having understood everything) I was deeply impressed, the profes sor's answer was a disapproving stare. How could anything be interesting, he seemed to be saying, that did not partake of Planck's quantum of energy or general relativistic covariance. I was working on concert hall acoustics at the time, using microwave cavities as a convenient model for acoustic enclosures. I was so on astounded by the chaotic distribution of the resonances that my measurements revealed - even for relatively small deviations from perfeet geometrie symmetry of the cavity: I had stumbled on the very same probability distribution that also governs the energy levels of complex atomic nuclei and that Eugene Wigner in Princeton had already promulgated in 1935. But at that time and for decades thereafter few (if any) physicists appreciated its general import. Now, 60 years later, the Wigner distribution is recognized as a universal tell-tale sign of "non-integrable" dynamical systems with chaotic behavior. But, of course, chaos hadn't been "invented" yet and I for one didn't know enough nuclear physics - too much like chemistry I thought - to see the connection. Nor, I think it is safe to say, did atomic physicists know enough about concert halls or microwave cavities to appreciate the common thread. Interestingly, the Wigner distribution also raises its head in - of all things - number theory, where it describes the distribution of the (interesting) zeros of the Riemann zeta-function. Be that as it may, my thesis advisor was much impressed by my progress and suggested that I go to America, the "microwave country." The thought Preface IX had crossed my mind and it did sound interesting. So I applied for a Fulbright Fellowship - and failed flat out. My academic grades were good, my English was passable but, the Fulbright commission concluded, I was not politically active enough. They were looking for foreign students who, upon their return from the States, could be expected to "spread the gospel" about democracy, the American Way of Life and so forth. But I, it seemed, was mostly interested in physics, mathematics, languages, photography, tennis and - perish the thought - dancing. During my thesis on the statistical interference of normal modes in microwave cavities and concert halls, I discovered the work of S. O. Rice on random noise, which explained a lot about sound transmission in concert halls. 2 Now Rice, I knew, was with Bell and so was Shannon. So after my Ful bright failure I went to my professor and asked hirn for a recommendation for Bell Laboratories. But he told me that Bell didn't take any foreigners. In fact, in 1938 he had backed one of his better students (who was eager to leave Germany) but Bell declined. (I still wonder what was behind that rejection.) Then, in early 1954, James Fisk (later president of Bell Laboratories) and William Shockley (of transistor farne) traveled to Germany (Fisk had studied in Heidelberg) to look for young talent for Bell. When I heard about this, I immediately went again to my professor telling hirn that Bell was not only accepting foreigners but they were actively looking for them. This time around the kind professor promised a recommendation and just three weeks later I received an invitation from Bell for an employment interview in London. An U nusual Interview The venue of the interview was the lobby of the Dorchester Hotel. After explicating my thesis work, I asked the recruiter to tell me a bit about the Bell System. "WeIl, there was AT&T, the parent company, Western Electric, the manufacturing arm, Bell Laboratories, and 23 operating companies: New York Telephone, New Jersey Bell, Southern BelL .. " when, in the middle of this recitation, he stopped short and, with his eyes, followed an elegant young lady (an incognito countess?) traversing the long lobby. After aminute 2 It transpired that such transmission functions were basically random noise, albeit in the frequency domain, and that room acousticians, for years, had measured nothing but different sampIes of the same random noise in their quest for a formula to pinpoint acoustic quality. Within a limited cohort of acousticians I became quite wellknown for this work although it was just an application of Rice's theory. But it did establish a new area of research in acoustics (and later microwaves and coherent optics): random wave fields, with many interesting ap plications: acoustic feedback stability of hands-free telephones and public address systems, fading in mobile communications and laser speckle statistics. X Preface or two, without losing a beat, he continued "yes, and there is Southwestern Bell, Pacific Telephone, ... "3 Everything went weIl during the interview, and I so on received an offer of employment. The monthly salary was $640 - five times as much as a young Ph.D. could have made at Siemens (500 Marks). But, of course, at Bell I would have worked for nothing. On September 30, 1954, arriving in New York, I stepped off the Andrea Doria (still afloat then) and into a chauffeured limousine which took me and my future director and supervisor to one of the best restaurants in the area. I couldn't read the menu (mostly in French) - except for the word Bratwurst. So, with champagne corks popping and sparkling desserts going off at neighboring tables, everybody in our party had Bratwurst and Löwenbräu beer (my future bosses were obviously very polite people) - I wonder how many immigrants were ever received in such style. Arriving at the Murray Hill Labs, I was put on the payroll and given a dollar bill as compensation for all my future inventions. (When I retired 33 years later I had garnered 45 V.S. patents and innumerable foreign filings, but the fact that I had earned less than 3 cents for every invention never bothered me.) Once securely ensconced at Murray Hill, I was encouraged to continue my work on random wave fields but - in typical youthful hubris - I thought I had solved all relevant problems and I elected to delve into speech. Speech, after all , meant language - always a love of mine - and possibly relevant to the telephone business. The Bachelor and the Hoboken Police William H. Doherty, my first Executive Director at Bell, introduced me to one and all as "Dr. Schroeder, who just joined us from Germany - and he is a bachelor." I was 28 then but apparently already getting a bit old for a single Catholic male. (I later had to disappoint Bill by marrying in an interdenominational ceremony - at the Chapel of the Riverside Church on the Hudson - my wife being Orthodox.) When one night, carrying a camera with a very long lens, I was arrested by dockworkers on the Hoboken piers (believing they had caught a foreign spy), 3 Twenty-five years later, to the day, on April 25, 1979, I went back to the Dor chester. At the far end of the lobby there was a kind of hat-check counter with an elderly lady behind it. I went up to her and asked "Could it be that on this day, 25 years ago, on April 25, 1954, a Sunday, a young woman might have appeared from the door behind you - it was about 2 p.m. - crossed the lobby and then exited by the revolving door?" She must have thought I was from Scotland Yard or something. But, unflustered, she answered "Oh yes, of course, at 2 p.m. we had a change of shifts then. This entrance was for service personnel - chambermaids and so forth." I asked her, "How do you know this?" And she said, "I have been here for 30 years."

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.