Speech Processing, Recognition and Artificial Neural Networks Springer London Berlin Heidelberg New York Barcelona Hong Kong Milan Paris Santa Clara Singapore Tokyo Gerard Chollet, Maria Gabriella Di Benedetto, Anna Esposito and Maria Marinaro (Eds) Speech Processing, Recognition and Artificial Neural Networks Proceedings of the 3rd International School on Neural Nets "Eduardo R. Caianiello" Springer Gerard Chollet, PhD ENST -CNR URA 820, 46 rue Barrault, 75634 Paris Cedex 13, France Maria Gabriella Di Benedetto, PhD 1NFOCOM Department, Rome University "La Sapienza", via Eudossiana 18,100184 Rome, Italy Anna Esposito, PhD IIASS, via G Pellegrino 19,1-84019 Vietri sul Mare (SA), Italy Maria Marinaro, PhD IIASS, via G Pellegrino 19,1-84019 Vietri suI Mare (SA), Italy TSBN-13: 978-1-85233-094-1 Springer-Verlag London Berlin Heidelberg British Library Cataloguing in Publication Data Speech processing, recognition and artificial neural networks: proceedings of the 3rd International School on Neural Nets "Eduardo R. Caianiello" l.Speech processing systems -Congresses 2.Speech perception -Congresses 3.N eural networks (Computer science) -Congresses LChollet, Gerard ILInternational School on Neural Nets (3rd: 1998: Salerno, Italy) 006.4'54 ISBN-13: 978-1-85233-094-1 e-ISBN-13: 978-1-4471-0845-0 DOl: 10.1007/978-1-4471-0845-0 Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms of licences issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers. © Springer-Verlag London Limited 1999 The use of registered names, trademarks etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant laws and regulations and therefore free for general use. The publisher makes no representation, express or implied, with regard to the accuracy of the information contained in this book and cannot accept any legal responsibility or liability for any errors or omissions that may be made. Typesetting: Camera ready by contributors 34/3830-543210 Organizing -Scientific Committee: Gerard Chollet (ENST -CNRS URA, Paris, France); Maria Gabriella Di Benedetto (INFO COM -University "La Sapienza", Rome, Italy); Anna Esposito (IIASS -E.R. Caianiello, Vietri sul Mare, Italy); Maria Marinaro (Dept. of Theoretical Physics, Salerno University, and IIASS -E.R. Caianiello, Vietri sul Mare, Italy). The sponsorship and support of: * International Institute for Advanced Scientific Studies "E.R. Caianiello" (IIASS) * European Speech Communication Association (ESCA) * University of Salerno are gratefully acknowledged List of Participants Acernese, Fausto -Universita di Salerno, Italy Al-ghoneim, Khaled -King Saud University, Riyadh, Saudi Arabia AI-Sulaiman, Mansour Mohammed -King Saud University, Riyadh, Saudi Arabia Albesano, Dario -CSELT, Torino, Italy Alwan, Abeer -Dept. of Electrical Engineering, 66-147E IV, UCLA, Los Angeles, USA Amato, Daniela -IIASS, Vietri sul Mare, Italy Auckenthaler, Roland -University of Wales, UK Aversano, Guido -Universita di Salerno, Italy Budillon, Alessandra -Universita di Napoli "Federico II", Naples, Italy Calabrese, Andrea -Linguistics Dept., Storss, USA Cassese, Maria -IIASS, Vietri sul Mare, Italy Cellesi, Francesco -INFN, Naples, Italy Cheboub, Leila -TCTS LAB, Faculte Polytechnique de Mons, Belgium Chen, Fang -IKP/IAV Linkoping University, Sweden Chollet, Gerard -CNRS URA-820, ENST, Dept. SIG, Paris, France Cosi, Piero -Istituto di Fonetica e Dialettologia, CNR, Padova, Italy D' Allievo, Matgherita Leondina -Universita di Pescara, Italy D'Eugenio, Mirko -Universit' dell'Aquila, Italy De Giovanni, Maria Teresa -IIASS, Vietri sul Mare, Italy De Mori, Renato -Laboratoire Informatique (LIA), Univ. d'Avignon, France Di Benedetto, Maria G. -INFO-COM, University of Rome, Italy Di Fabio, Mirko -Universit' dell' Aquila, Italy Di Perna, Francesco -Universita dell' Aquila, Italy Di Toma, Elisabetta -Universita dell' Aquila, Italy Donato, Daniel-Universita di Salerno, Italy Donato, I. Yuri -Universita di Salerno, Italy Dos Santos, Camargo Laurizete -Instituto Tecnologico de Aeronautica San Jose dos Campos, Brazil Esposito, Anna -IIASS, Vietri sul Mare, Italy Farinas, Jerome -Institut de Recherche en Informatique de Toulouse, France Giaquinta, Maria -IIASS, Vietri sul Mare, Italy Gill Favela, Barbara -Scuola Normale Superiore, Pisa, Italy Giovannardi, Maurizio -INFO-COM, University of Rome, Italy Goronzy, Silke -Sony International (Europe) GmbH Stuttgart Tech. Center, Germany Granstrom, Bjorn -Dept. of Speech, Music and Hearing, Centre for Speech Technology, Stockholm, Sweden Haton, Jean-Paul- Univ. Henri Poincare, Nancy 1, Membre de Institut Universaire de France, LOIRA/INRIA BP 239 Heldner, Mattias -Umea University, Department of Phonetics, Sweden Hermansky, Hynek -Oregon Graduate Institute, USA Houard, Sarah -TCTS LAB, Faculte Polytechnique de Mons, Belguim . Izzo, Graziano -IIASS, Vietri sul Mare, Italy Jesus, Luis -ISIS Research Group, Dept. of Electronics and Computer Science, Southampton, UK Lienard, Jean S. -LIMSI-CNRS, Orsay, France Marinaro, Maria -IIASS, Vietri sul Mare and Universita di Salerno, Italy Mattera, Davide -Universita di Napoli "Federico II", Italy Mauuary, Laurent -Centre National d'Etudes des Telecommunications DIHIDIPS Technopole Anticipa, Lannion, France Ney, Hermann -Lehrstuhl flir lnformatik VI, RWTH Aachen, University of Technology, Germany Ohala, John J. -Dept. of Linguistics, University of California, Berkeley, USA Padovani, Marco -IIASS, Vietri sul Mare, Italy Palma, Giuseppe -IIASS, Vietri sul Mare, Italy Palmieri, Francesco -Universita di Napoli "Federico II", Italy Riccio, Giovanni -IIASS, Vietri sul Mare, Italy Rigotti, Camilla -Politecnico di Milano, Dip. di Bioingegneria, Italy Rojc, Matej -University of Maribor, Maribor, Slovenia Savino, Michelina -D.E.E Politecnico di Bari, Italy Sazhok, Mykola -NAS Institute of Cybernetics and UNESCO/IIP, Kyjiv, Ukraine Tounsi, Nawfal-TCTS LAB, Faculte Polytechnique de Mons, Belguim Vicinanza, Sergio -IIASS, Vietri sul Mare, Italy Vignoli, Fabio -DIST, University of Genova, Italy Vintsiuk, Taras -NAS Institute of Cybernetics and UNESCO/lIP, Kyjiv, Ukraine Vitulano, Domenico -lAC, Rome, Italy Voglino, Giuseppe -ST Microelectronics Central R&D, DAIS Weber, Andrea C. -Max-Planck-Institute for Psycho-linguistics, Holland Preface This volume contains invited and contributed papers presented at the International Summer School Neural Nets "E.R. Caianiello" on Speech Processing, Recognition, and Artificial Neural Networks", held in Vietri sul Mare, Salerno, Italy, October 5-141998. The aim of this book is to provide primarily high level tutorial coverage of the fields related to speech science and linguistics, including automatic speech recognition, connectionist approaches to speech, and Hidden Markov Models (HMM). Twelve surveys are offered by specialists in the field. Consequently, the volume may be used as a reference book on speech. Six original papers, which present original contributions and complete the tutorials are included. The volume is divided into five sections: Fundamentals of Speech Analysis and Perception, Speech Processing, Stochastic Models for Speech, Auditory and Neural Network Models for Speech, and Task Oriented Applications of Automatic Speech Recognition. Fundamentals of Speech Analysis and Perception deals with problems related to the acoustic-phonetic theory in which basic speech sounds are characterized according to both their linguistic properties and the associated acoustic measurements. Fundamental and innovative ideas in speech perception are covered. This section contains four tutorial papers. The first tutorial, authored by John Ohala, discusses underlying aerodynamic forces essential to speech sounds. The second, by Andrea Calabrese, describes and accounts for various phonetic and phonological processes which characterize normal connected speech. The tutorial presented by Maria-Gabriella Di Benedetto and Anna Esposito deals with the problem of identifying a set of acoustic and perceptual features for characterizing basic speech sounds (vowels and consonants). Finally, the tutorial by Jean-Sylvain Lienard reports on the physiological, psychological and linguistic mechanisms which underlie the process of speech perception. Speech Processing deals with new and standard techniques used to provide speech features used in recognition systems. This section contains two tutorials. The first by Hynek Hermansky, reports on early speech analysis techniques and on recent auditory modeling for speech analysis. The second, by Abeer Alwan et al., deals with two-dimensional preprocessing techniques which combine both articulatory and acoustic information embedded in the speech signal. Stochastic Models for Speech includes the tutorials by Hermann Ney (Search Strategies and Statistical Translation) and Renato De Mori (Language Models), and describes speech recognition systems based on HMM and the methods used for selecting and modeling speech units. This section also contains four contributed papers which complete the tutorials. Auditory and Neural Network Models for Speech includes the tutorials by Piero Cosi (Auditory Models) and Jean-Paul Haton (Neural Networks), and provides an overview of recent applications for speech analysis and recognition using biological models. Two contributed papers dealing with phoneme classification using neural networks are included. Task-Oriented Applications of Automatic Speech Recognition and Synthesis includes the papers by Gerard Chollet et al. (Speech technology for Telephone Interactive Voice Servers) and Bjorn Granstrom (Multi-Modal Speech Synthesis). This section analyzes the principles by which task oriented applications may be successful. The editors would like to thank the European Speech Communication Association (ESCA) and in particular professor W olfang Hess, the International Institute for Advanced Scientific Studies "E.R. Caianiello", and the University of Salerno for their support in sponsoring, financing, and organizing the school. In addition, the editors are grateful to the contributors of this volume whose work stimulated an extremely interesting interaction with the attendees, who in turn shall not be forgotten for being so highly motivated and bright. It is to them that this book is dedicated. The editors: Gerard Chollet, Maria-Gabriella Di Benedetto, Anna Esposito, Maria Marinaro. Contents Section 1 -Fundamentals of Speech Analysis and Perceptron Articulatory Constraints on Distinctive Features john j. Ohala ............................................................................................. 3 "Herr Miiller vivra a Taranto con i suoi colleghi austriaci" Investigations on a fragment of Italian Phonology Andrea Calabrese ..................................................................................... 21 Acoustic Analysis and Perception of Classes of Sounds (vowels and consonants) Maria Gabriella Di Benedetto and Anna Esposito ................................ 54 Speech and Voice Perception: Beyond Pattern Recognition jean-Sylvain Lienard ............................................................................... 85 Section 2 -Speech Processing Analysis in Automatic Recognition of Speech Hynek Herman.5ky ........................................................................................ 115 Speech Production and Perception Models and their Applications to Synthesis, Recognition, and Coding Abeer Alwan, S. Narayanan, B. Strope and A. Shen ............................. 138 Section 3 -Stochastic Models for Speech Statistical Methods for Automatic Speech Recognition Renato de Mori ......................................................................................... 165 Statistical Modelling: from Speech Recognition to Text Translation Hermann Ney ........................................................................................... 190 Continuous Speech Recognition with Neural Networks: An Application to Railway Timetables Enquires Dario Albesano, Franco Mana and Roberto Gemello ........................... 216
Description: