ebook img

A target-based articulatory synthesizer PDF

276 Pages·1991·8.8 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview A target-based articulatory synthesizer

A TARGET-BASED ARTICULATORY SYNTHESIZER By PEDRO P.L. PRADO A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1991 To Silvia, Patricia and Roger . ACKNOWLEDGEMENTS I am sincerely grateful to my advisor and committee chairman, Dr. Donald G. Childers, for his competent advice, exemplary classes, discernment, guidance, enthusiasm, and continuous encouragement. I wish to thank Dr. Fred J. Taylor, Dr. J. C. Principe, Dr. A. A. Arroyo, and Dr. H. B. Rothman for their time and interest in serving on the supervisory committee Special thanks to my colleagues at the Mind-Machine Research Center, for their help and friendship. I am also indebted to the Exercito Brasileiro (Brazilian Army) for having selected me to pursue the Ph.D. degree at the University of Florida and to the C.N.Pq-Conselho Nacional de Desenvolvimento Cientifico e Tecnologico (Scientific and Technological National Development Agency - Brazil) for the scholarship I was granted. Finally, my gratitude and love to my family. iii 25346 TABLE OF CONTENTS Page ACKNOWLEDGEMENTS iii ABSTRACT vii CHAPTERS 1 ARTICULATORY SYNTHESIZERS 1 1.1 Introduction 1 1.2 History of Articulatory Synthesizers 2 1.3 The Human Vocal System and the Speech Production 5 1.3.1 Description of the Vocal System 5 1.3.2 Excitation Mechanisms and Place of Articulation 9 1.3.3 A Basic Articulatory Synthesizer 15 1.4 Models for Articulatory Synthesizers 23 1.4.1 Source or Excitation Models 23 1.4.1.1 Mechanical models 23 1.4.1.2 Parametric models 24 1.4.1.3 Data for subglottal models .... 36 1.4.1.4 Data for glottal models 36 1.4.2 Vocal and Nasal Tracts Models 43 1.4.21..11 Uniform acoustic tube 45 1.4.2. Radiation at lips and nostrils 52 1.4.2. Hearing 53 1.4.2. Noise source 53 1.4.2. Concatenation of tubes 54 1.4.2. Data for vocal tract models ... 60 1.5 Applications of Articulatory Synthesizers 66 1.6 Research Goals 68 1.7 Description of Chapters 75 2 ARTICULATORY SYNTHESIZER MODEL 78 2.1 Introduction 78 2.2 Articulatory Model 78 2.2.1 Articulators 78 2.2.2 Area Function Estimation 81 2.2.3 Implementation 81 2.3 Acoustical Model 95 2.3.1 Glottal Excitation 95 2.3. Vocal fold vibration 96 IV 2534 2.3.1.2 Two-mass model 98 2.3.1.3 Proposed models 107 2.3.1.4 Control parameters 114 2.3.2 Vocal and Nasal Tract Models 119 2.3.3 Radiation and Noise Source 125 2.4 Analysis/Synthesis Scheme 128 2.5 Results and Discussion 136 3 THE INVERSE MAPPING 150 3.1 Introduction 150 3.2 Main Issues in the Area Function Derivation ... 152 3.2.1 The Direct Methods 152 3.2.2 The Theoretical Basis for the Inverse Mapping 152 3.2.3 Advantages and Disadvantages 155 3.2.3.1 Lip impulse response method ... 155 3.2.3. Linear prediction/acoustical tube method 157 3.2.3. Codebook generation: numerical approaches 158 3.2.3. Feedback methods 161 3.2.3. Other methods 163 3.3 Inverse Mapping Approach 164 3.3.1 Choice of the Method 164 3.3.2 Strategy for the Optimization Procedure 165 3.3.3 Description of the Optimization Method 165 3.3.4 Implementation of the Optimization Method 175 3.3.5 Results and Discussion 185 4 VALIDATION AND EXPERIMENTS 207 4.1 Introduction 207 4.2 Female Voice Simulation 212 4.3 Creaky Voice Simulation 218 4.4 Results and Discussion 220 5 CONCLUSIONS AND SUGGESTIONS 234 5.1 Introduction 234 5.2 Summary and Discussion 234 5.2.1 Articulatory Model Realization 234 5.2.2 Development of the Acoustic Model 236 5.2.3 Acoustic-to-Articulatory Transformation 237 5.2.4 Validation and Experiments 239 5.3 Suggestions for Future Research 240 5.3.1 Exhaustive Experimentation 240 5.3.2 Assesment of Losses 241 5.3.3 Inverse Mapping of Nasals 242 5.3.4 Interpolation of Areas and Parameters .. 242 5.3.5 Wave Digital Filter Implementation 243 v 5.3.6 Optimization Scheme for the Source 243 5.3.7 Neural Networks or Dynamic Programming 243 5.3.8 Rules for Text-to-Speech Synthesis 244 5.3.9 Speaker-Independent Speech Recognition 244 APPENDICES A INVERSE MAPPING SUBROUTINES 245 B DATA FILES 251 REFERENCES 252 BIOGRAPHICAL SKETCH 266 vi Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy A TARGET-BASED ARTICULATORY SYNTHESIZER By Pedro P. L. Prado December 1991 Chairman: D. G. Childers Major Department: Electrical Engineering The ultimate goal of this research was to develop a versatile articulatory synthesizer for perception studies, harmonizinghighquality speechwith small computational time. A new implementation of the articulatory model was developed that provides an interactive graphic editor using computer aided design techniques. A digital time-domain approach was used for the acoustic model. Two new glottal excitation models were proposed: a parametric 2-mass model and a glottal area model. The tracts and the radiation were simulatedby an equivalent resistive network. The sinuses and the turbulent noise sources were included in the synthesizer to simulate consonants. The various parameters of the synthesizer can be modified for a variety of experiments, including the synthesis of various voice types and vocal disorders. Some new parameters, not available in other vii synthesizers, were designed into our system. A new optimization scheme was conceived for deriving vocal tract area functions from speech, for American-English phonemes. The scheme uses both gradient search and linear successive approximation, providing fast convergence, errors less than 2%, and natural articulatory dynamics, while circumventing local minima traps. The objective function is the least-absolute-value error between the model-derived and the speech-derived first three formants. The gradient search method was improved with respect to computational time by using an algorithm inspired by the Fletcher-Reeves Method. Proper articulatory dynamics were achieved by considering the vocal tract losses in the area-to-formant transformation, by establishing appropriate initial configurations, by properly selecting the parameters for the optimization procedure, by imposing constraints on the relative placement of the articulators, and by using flexible on-line pictorial aids. All of these factors were found to result in an articulatorysynthesizerdesignthatproducednatural-sounding synthetic speech with small computational time. viii CHAPTER 1 ARTICULATORY SYNTHESIZERS 1.1 Introduction The human body is probably the most perfect machine. For many years scientists and researchers have realized that aspects of engineering can be enlightened by imitating the natural mechanisms of not only the human body but also that of other animals. Illustrative examples are the "psychophysical effects of vision," which provided an improved image processing technique (Kunt et al 1985) and "neural ., networks," which many consider a revolutionary approach for solving complex pattern-recognition problems. In a like manner, speech researchers have tried to model the mechanisms of generation and propagation of sound waves in the human vocal system. Although formant and linear predictive coding (LPC) synthesizers have been considerably improved in recent years (Atal and Caspers, 1983; Schroeder andAtal, 1985; Childers et al., 1985b; Klatt, 1987; Pinto et al 1989; Childers and Wu, ., 1990), the articulatory synthesizer has a greater potential to deal with some issues that are essential for producing high- quality and natural-sounding speech at low bit rates: source- 1 . . . , 2 tract interaction (Koizumi et al., 1985), nasals (Maeda, 1982b), parameter interpolation (Sondhi and Schroeter, 1987) reproduction of transitions between phonemes (Coker, 1967), etc The first sections of this chapter present an overview of articulatory synthesizers. Section 1.5 establishes the goals of this research, and the last section describes the other chapters 1.2 History of Articulatory Synthesizers History registers von Kempelen (Flanagan, 1972b) as the pioneer who attempted to reproduce the human voice (in 1791, Vienna). His device consisted of a bellows, a reed and a leather tube whose shape was controlled by hand. Dudley (1939) however, can be considered the "Father of Electrical , Speech Synthesizers" by virtue of his remarkable "VODER," the first device capable of coding voice signals. The VODER (voice operation demonstrator) consisted of abank ofbandpass electronic filters controlled by a keyboard and driven either by a relaxation oscillator (voiced source) or by a noise source, depending on the position of a wrist bar. A foot pedal determined the fundamental frequency of the voiced sounds In 1950, another important step in the area of speech production and perception was made with the "Pattern Playback," an optical-electrical synthesizer designed at

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.