Table Of Content

A FLEXIBLE AND HIGH QUALITY ARTICULATORY SPEECH SYNTHESIZER BY YU-FU HSIEH A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1994 Tomyparents,and Tomydaughter,Alice,andmywife,H.C.Chen ACKNOWLEDGEMENTS Iwishtoexpressmygratitudetomysupervisorandcommitteechairman,Dr.D. G.Childers,forhisinvaluableadvice,generousguidance,andconstantencouragement. His critical comments and suggestions during the research and the writing have contributedtotheformandcontentsofthedissertationpresentedhere. Iwouldliketo thankDr.J.C.Principeforhisinvaluablesuggestions. ThanksalsogotoDr.L.W.CouchII,Dr.F.J.Taylor,andDr.H.B.Rothmanfor theirtimeandinterestinservingonthesupervisorycommittee. Specialthanksgotomy colleaguesattheMind-MachineInteractionResearchCenterfortheirhelpandfriendship. IamalsoindebtedtotheChung-ShanInstituteofScienceandTechnologyfor having selected me to pursue the Ph.D. degree at the University ofFlorida and for grantingmethescholarship. TABLE OF CONTENTS ACKNOWLEDGEMENTS iii ABSTRACT viii CHAPTERS INTRODUCTION 1 1 1.1 TheMechanismsofSpeechProduction 1 1.1.1 Speech-soundSources 3 1.1.2 AcousticModulation 3 1.2 SpeechSynthesisModels 4 1.2.1 FormantSynthesis 4 1.2.2 LinearPrediction(LP)Synthesis 5 1.2.3 ArticulatorySynthesis 9 1.3 ResearchGoalsandMethodology 11 1.3.1 ResearchGoals 11 1.3.2 ResearchMethodology 11 1.4 DescriptionofChapters 12 2 ARTICULATORYSYNTHESIZERMODEL 14 2.1 ReviewofArticulatoryModels 14 2.1.1 ParametricAreaModels 15 2.1.2 MidsagittalDistanceModels 15 2.2 ImplementationofArticulatoryModels 15 2.2.1 ArticulatoryParametersandMidsagittalVocalTractOutline . 17 2.2.2 DeterminationoftheVocalTractSectionLengthsand Cross-sectionalAreas 19 IV 2.2.3 CalculationofFormantFrequenciesfromtheVocalTract Cross-sectionalAreas 22 2.2.4 EstimatetheVocalTractCross-sectionalAreafromtheFormant Frequencies 25 2.3 AcousticModels 25 2.3.1 VocalTractModels 25 2.3.1.1 Soundpropagation 28 2.3.1.2 Uniformlosslesssection 29 2.3.1.3 Approachesforvocaltractsimulation 30 2.3.1.4 Wallimpedance 32 2.3.2 NasalTractandSinusCavities 37 2.3.3 RadiationModelsofLipsandNostrils 38 2.3.4 ExcitationSourceModels 42 2.3.4.1 Excitationattheglottis 42 2.3.4.2 Excitationinthevocaltract 44 2.3.5 GlottalImpedanceandSubglottalModels 46 2.3.6 NoiseSourceModels 50 2.4 AnalysisofVariousVocalSystemCharacteristics 56 2.4.1 Frequency-DependentComponents 56 2.4.2 NumberofVocalTractSections 58 2.4.3 NasalTractSystem 58 2.4.4 GlottalImpedanceandSubglottalSystem 61 2.4.5 ExcitationintheVocalTract 64 2.5 ArticulatorySynthesizers 64 2.5.1 Realization 68 2.5.2 InterpolationFunctions 73 2.5 Summary 73 3 SPEECHINVERSEFILTERING 76 3.1 ReviewoftheDerivationsoftheVocalTractAreaFunction 76 3.1.1 DirectMeasurements 77 3.1.2 EstimationfromAcousticData 77 v 3.2 SimulatedAnnealingAlgorithms 81 3.2.1 OriginoftheAlgorithm 82 3.2.2 TheCoolingSchedule 83 3.2.3 TheSimulatedAnnealingAlgorithm 85 3.3 SpeechInverseFilteringStrategyandProcedure 88 3.3.1 Strategy 89 3.3.2 Procedure 91 3.4 ResultsandRemarks 96 3.4.1 OptimizationofVowels 96 3.4.2 OptimizationforaSentence 97 3.4.3 Remarks 98 3.5 Summary 99 4 SOFTWARESYSTEMFORARTICULATORYSYNTHESIS 100 4.1 AnalysisPhase 100 4.2 SpeechInverseFilteringPhase 103 4.3 ExcitationGenerationPhase 106 4.4 SynthesisPhase 108 4.5 Summary Ill 5 EXPERIMENTS 112 5.1 ExperimentA 112 5.2 ExperimentB 116 5.3 ExperimentC 118 5.4 ExperimentD 122 5.5 ExperimentE 126 5.6 ExperimentF 129 5.7 Summary 132 6 CONCLUSIONSANDRESEARCHEXTENSIONS 133 6.1 Summary 133 6.1.1 ArticulatoryModelImplementation 133 vi 6.1.2 AcousticModelRealization 134 6.1.3 ArticulatorySynthesizerImplementation 135 6.1.4 SpeechInverseFiltering 135 6.2 6.1.5 ArticulatorySynthesisSoftwareSystem 136 6.1.6 Experiments 137 ExtendedResearches 137 6.2.1 OptimizationwithFormantFrequenciesandBandwidths .... 138 6.2.2 SpeechInverseFilteringofNasals,Fricatives,andPlosives .. 138 6.2.3 ImprovementoftheTurbulenceNoiseSourceModel 139 6.2.4 GlottalInverseFilteringandLFModelParametersExtraction 139 6.2.5 InterpolationoftheArticulatoryTrajectories 140 6.2.6 SourceModelforExcitationRelocation 141 APPENDICES A ACOLLECTIONOFFEATURESFORTYPICALAMERICANVOWELS 143 B ACOUSTICTRANSFERFUNCTIONCALCULATION 150 C DERIVATIONOFDISCRETE-TIMEACOUSTICEQUATIONS 161 D GUIDELINEANDAPPLIEDSENTENCERESULTSOFTHE OPTIMIZATIONPROCEDURE 179 REFERENCES 190 BIOGRAPHICALSKETCH 203 vii Abstract ofDissertation Presented to the Graduate School ofthe University ofFlorida in Partial Fulfillment ofthe Requirements for the Degree ofDoctorofPhilosophy A FLEXIBLE AND HIGH QUALITY ARTICULATORY SPEECH SYNTHESIZER By Yu-Fu Hsieh December, 1994 Chairman: Dr. D. G. Childers Major Department: Electrical Engineering Theaimofthisresearchwastodeveloponesolutiontothespeechinversefiltering problemandtodevelopaflexible,highqualityarticulatoryspeechsynthesistool. The resultsofthisstudywillbeofinteresttoresearchersinspeechmodeling,analysis,and synthesis. A software program called ARTM was implemented as an articulatory synthesistool. Onefeatureofthisresearchtoolisthesimulatedannealingoptimization procedurethatisusedtooptimizethevocaltractparameterstomatchaspecifiedsetof formantcharacteristics. Anotheraspectofthisstudyisthederivationofanewformofthe acoustic equations that include the subglottal system, the glottal impedance, the turbulence noise source, and the nasal tract with sinus cavities for the articulatory synthesizer. Aflexiblearticulatorymodelwasdesignedwithspecialinterfacesthatprovidefor numerical specification ofparameters as well as sliding bar capabilities that allow parameteradjustments. Atransmission-linecircuitmodelofthevocalsystem, which includes thevocal tract,thenasaltractwith sinuscavities,theglottalimpedance,the subglottaltract,theexcitationsource,andtheturbulencenoisesource,wasconstructed. viii Theacousticequationsofthevocalsystemwererederivedfortheproposedarticulatory synthesizer. Adigitaltime-domainapproachwasusedtosimulatethedynamicproperties ofthevocalsystemaswellastoimprovethequalityofthesynthesizedspeech. Anewefficientanalysisscheme,identifyingthearticulatoryparametersfromthe acoustic speech waveforms, was induced. The algorithm is known as simulated annealing,whichisconstrainedtoavoidnon-uniquesolutionsandlocalminimaproblems. Theconstraintsweredeterminedbythearticulatory-to-acoustictransformationfunction and the boundary conditions forthe articulatory parameters. The costfunction was definedasapercentageoftheweightedleast-absolute-valueerrordistancebetweenthe first four formant frequencies of the articulatory model and the first four formant frequenciesdeterminedfromspeechanalysis. A1%errorcriterionwasfoundtobeboth practicalandachievable. tx CHAPTER 1 INTRODUCTION Speechisperhapsthemostuniquecapabilityofthehumanspecies. Speechisour everydaycommunicationmedium. Thus,itisnaturalthatengineersandspeechscientists haveaninterestinanalyzing,recognizing,andsynthesizingspeech. Basically,thereare threeareasofspeechscienceresearch. Theyarespeechacoustics,speechperception,and speech physiology. Oneofthe mostimportantaspectsofinstrumentationforspeech scienceisspeechsynthesis,whichcanserveasamodelofspeechproductionandprovide amechanismforthemechanicalproductionofspeech. Anarticulatoryspeechsynthesizer isamarriageofacousticandphysiologicaltechniquesaswellasamodelofthehuman articulatorysystem. Theaimofthisdissertationistoconstructanarticulatoryspeech synthesissoftwaresystemforthestudyofspeechacousticsandspeechphysiology. Understanding the human speech production processisimportantnot only in speech synthesisbutalsoinautomaticspeechrecognitionandinthedigitalcodingof speech. We first introduce the mechanisms of speech production, followed by an overviewofsomeexistingspeechsynthesismodels. Wethenoutlinethegoalsandthe plansofthisresearch,anddescribethecontentofotherchapters. 1.1 TheMechanismsofSpeechProduction Whendevelopingspeechsynthesisforitsmanypossibleapplications,suchasa broad range of telecommunications applications, aids for the handicapped, and the diagnosis of articulation deficiencies, it is helpful to have an understanding ofthe mechanismsofspeechproductionsothattheseprocessescanbemodeled. Figure 1-1isamidsagittalsectionofaportionofthehumanbody,showingthe appropriateorgansforspeechproduction,whichincludethelungs,larynx,pharynx,nose. 1