ebook img

Progress in Speech Synthesis PDF

590 Pages·1997·13.488 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Progress in Speech Synthesis

Progress in Speech Synthesis Jan P.H. van Santen Richard W. Sproat Joseph P. Olive Julia Hirschberg Editors Progress in Speech Synthesis With 158 Illustrations Springer Jan P.H. van Santen Richard W. Sproat Bell Laboratories Bell Laboratories Room2D-452 Room2D-451 600 Mountain Avenue 600 Mountain Avenue Murray Hill, NJ 07974-0636 USA Murray Hill, NJ 07974-0636 USA Joseph P. Olive Julia Hirschberg Bell Laboratories AT&T Research Room 2D-447 Room 2C-409 600 Mountain Avenue 600 Mountain Avenue Murray Hill, NJ 07974-0636 USA Murray Hill, NJ 07974-0636 USA Library of Congress Cataloging-in-Publication Data Progress in speech synthesis/Jan P.H. van Santen ... let al.l, editors. p. cm. Includes bibliographical references and index. ISBN 978-1-4612-7328-8 ISBN 978-1-4612-1894-4 (eBook) DOl 10.1007/978-1-4612-1894-4 1. Speech synthesis. 1. Santen, Jan P.H. van. TK7882.S65S6785 1996 006.5' 4-dc20 96-10596 Printed on acid-free paper. © 1997 Springer Science+Business Media New York Originally published by Springer-Verlag New York, Inc. in 1997 Softcover reprint of the hardcover 1 st edition 1997 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher, Springer Science+Business Media, LLC except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereaf ter developed is forbidden. The use of general descriptive names, trade names, trademarks, etc., in this publication, even if the former are not especially identified, is not to be taken as a sign that such names, as understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone. Production managed by Natalie Johnson; manufacturing supervised by Jacqui Ashri. Camera-ready copy prepared using LaTeX. Printed and bound by Edwards Brothers, Inc., Ann Arbor, MI. Additional material to this book can be downloaded from http://extras.springer.com. 9 8 7 6 543 2 ISBN 978-1-4612-7328-8 Preface Text-to-speech synthesis involves the computation ofaspeech signal from input text. Accomplishing this requires a system that consists ofan astonishing range ofcomponents, from abstract linguistic analysis ofdiscourse structure to speech coding. Several implications flow from this fact. First, text-to-speech synthesis is in herently multidisciplinary, as is reflected by the authors of this book, whose backgroundsincludeengineering,multipleareasoflinguistics,computerscience, mathematicalpsychology,andacoustics. Second,progressintheseresearchareas is uneven because the problems faced vary so widely in difficulty. Forexample, producingflawless pitchaccentassignmentfor complex, mUlti-paragraph textual inputisextremelydifficult, whereas producingdecentsegmental durations given thatallelsehasbeencomputedcorrectlyisnotverydifficult.Third,theonlywayto summarizeresearchinallareasrelevantforTISisintheformofamulti-authored book-no singleperson, oreven small groupofpersons, has asufficientlybroad scope. The most importantgoal ofthis bookis, ofcourse, to provide an overview of these research areas by having invited key players in each area to contribute to this book.This willgivethereaderacompletepictureofwhatthechallengesand solutionsareandinwhatdirectionsresearchersaremoving. Butan importantsecondgoalis to allow thereadertojudgewhatall this work adds up to-the scientific sophistication may be impressive, but does it really producegoodsyntheticspeech?Thebookattempts toanswerthis questionintwo ways.First,wehaveaskedauthorstoincluderesultsfromsubjectiveevaluationsin theirchapterwheneverpossible.Thereisalsoaspecial sectiononperceptionand evaluationin the book. Second, the bookcontains aCD-ROMdisk withsamples ofseveralofthesynthesizersdiscussed. Boththesuccessesandthefailures areof interest-the latter in particularbecause it is unlikely that samples are included demonstrating majorflaws inasystem.WethankChristianBenoitfor suggesting this idea. vi Preface Abriefnoteonthehistoryofthisvolume:In1990,theFirstESCAWorkshopon Text-to-SpeechSynthesiswasorganizedinAutrans,France.Theorganizersofthis workshop,GerardBaillyandChristianBenoit,feltthattherewasaneedforabook containinglonger verionsofpapers from the workshop proceedings, resulting in TalkingMachines.In1994,theeditorsofthecurrentvolumeorganizedtheSecond ESCAlIEEE/AAAIWorkshoponText-to-SpeechSynthesis andlikewisedecided thatabookwasnecessarythatwouldpresenttheworkreportedintheproceedings in amore complete, updated, and polished form. To ensure the highest possible qualityofthechaptersforthecurrentvolume,weaskedmembersofthescientific committeeoftheworkshoptoselectworkshoppapersforpossibleinclusion.The editors added their selections, too. We then invited those authors whose work receivedanunambiguousendorsementfromthisprocesstocontributetothebook. Finally, wewanttothankthemanypeoplewhohavecontributed: thescientific committeemembers whohelpedwiththeselectionprocess;fourteen anonymous reviewers;BerndMoebiusforworkonstubbornfigures;AliceGreenwoodforedi torialassistance;MikeTanenblattandJuergenSchroeterforprocessingthespeech andvideofiles;DavidYarowskyforprovidingtheindex;ThomasvonFoersterand KennethDreyhauptatSpringer-Verlagforexpeditingtheprocess;CathyHopkins for administrativeassistance; andBellLaboratoriesforitsencouragementofthis work. JanP. H. vanSanten RichardW. Sproat JosephP. Olive JuliaHirschberg MurrayHill,NewJersey October 1995 Contents Preface v Contributors xvii I Signal Processing and SourceModeling 1 1 SectionIntroduction.RecentApproachestoModelingtheGlottal SourceforTTS 3 DanKahn, MarianJ. Macchi 1.1 Modelingthe GlottalSource: Introduction 3 1.2 AlternativestoMonopulseExcitation . 4 1.3 AGuidetotheChapters 5 1.4 Summary ............. 6 2 SynthesizingAllophonicGlottalization 9 JanetB. Pierrehumbert,StefanFrisch 2.1 Introduction. . . . . . 9 2.2 ExperimentalData ...... 10 2.3 SynthesisExperiments .... 12 2.4 ContributionofIndividualSourceParameters 20 2.5 Discussion 20 2.6 Summary ................... 24 3 Text-to-Speech Synthesis with Dynamic Control ofSource Parameters 27 LuisC. Oliveira 3.1 Introduction . 27 3.2 SourceModel . 27 viii Contents 3.3 Analysis Procedure 30 3.4 Analysis Results 33 3.5 Conclusions .... 36 4 Modificationofthe AperiodicComponentofSpeech Signalsfor Synthesis 41 GaelRichard, ChristopheR. d'Alessandro 4.1 Introduction.................. 41 4.2 SpeechSignalDecomposition 43 4.3 AperiodicComponentAnalysis andSynthesis 47 4.4 Evaluation........ 50 4.5 SpeechModifications. . . 51 4.6 DiscussionandConclusion 54 5 On the Use ofa Sinusoidal Model for Speech Synthesis in Text-to-Speech 57 Miguel Angel Rodriguez Crespo, Pilar Sanz Velasco, LuisMonzonSerrano,JoseGregorioEscaladaSardina 5.1 Introduction............ 57 5.2 OverviewoftheSinusoidalModel 59 5.3 SinusoidalAnalysis. . . . . . . . 60 5.4 SinusoidalSynthesis . . . . . . . 60 5.5 SimplificationoftheGeneralModel 61 5.6 ParametersoftheSimplifiedSinusoidalModel . 64 5.7 FundamentalFrequencyandDurationModifications 65 5.8 AnalysisandResynthesisExperiments 66 5.9 Conclusions..................... 69 II LinguisticAnalysis 71 6 Section Introduction. The Analysis ofTextin Text-to-Speech Synthesis 73 RichardW. Sproat 7 Language-IndependentData-Oriented Grapheme-to-Phoneme Conversion 77 WalterM. P.Daelemans,AntalP.1. vandenBosch 7.1 Introduction..... 77 7.2 DesignoftheSystem 79 7.3 RelatedApproaches . 85 7.4 Evaluation 86 7.5 Conclusion..... 88 Contents ix 8 All-ProsodicSpeechSynthesis 91 ArthurDirksen,JohnS. Coleman 8.1 Introduction.... 91 8.2 Architecture.... 93 8.3 PolysyllabicWords 100 8.4 ConnectedSpeech 104 8.5 Summary..... 106 9 AModelofTimingforNonsegmentalPhonologicalStructure 109 JohnLocal, RichardOgden 9.1 Introduction........................ 109 9.2 SyllableLinkageandItsPhoneticInterpretationin YorkTalk. 110 9.3 TheDescriptionandModelingofRhythm . . . . . . . . .. 112 9.4 Comparison of the Output of YorkTalk with NaturalSpeechandSynthesis. 117 9.5 Conclusion................. 119 10 A Complete Linguistic Analysis for an Italian Text-to-SpeechSystem 123 GiulianoFerri,PieroPierucci,DonatellaSanzone 10.1 Introduction........ 123 10.2 TheMorphologicAnalysis . . . 125 10.3 ThePhoneticTranscription . . . 130 10.4 TheMorpho-SyntacticAnalysis 132 10.5 PerformanceAssessment 137 10.6 Conclusion . . . . . . . . . . . 137 11 DiscourseStructuralConstraintsonAccentinNarrative 139 ChristineH. Nakatani ILl Introduction...................... 139 11.2 TheNarrativeStudy 140 11.3 ADiscourse-BasedInterpretationofAccentFunction 142 11.4 DiscourseFunctionsofAccent 146 11.5 Discussion 150 11.6 Conclusion . . . . . . . . . . 153 12 HomographDisambiguationinText-to-SpeechSynthesis 157 DavidYarowsky 12.1 ProblemDescription 157 12.2 PreviousApproaches 158 12.3 Algorithm...... 159 12.4 DecisionListsforAmbiguityClasses. 165 12.5 Evaluation 168 12.6 DiscussionandConclusions ..... 169 x Contents III Articulatory Synthesis and Visual Speech 173 13 SectionIntroduction.TalkingHeadsinSpeechSynthesis 175 DominicW. Massaro,MichaelM.Cohen 14 SectionIntroduction.ArticulatorySynthesisandVisualSpeech 179 JuergenSchroeter 14.1 Bridging the Gap Between Speech Science and Speech Applications 179 15 SpeechModelsandSpeechSynthesis 185 MaryE. Beckman 15.1 ThemeandSomeExamples. . . . . . . . . 185 15.2 ADecadeandaHalfofIntonationSynthesis 187 15.3 ModelsofTime . 197 15.4 Valediction . . . . . . . . . . . . . . . . . 202 16 A Framework for Synthesis of Segments Based on PseudoarticulatoryParameters 211 CorineA. Bickley,KennethN. Stevens,DavidR. Williams 16.1 BackgroundandIntroduction. . . . . . . . 211 16.2 ControlParametersandMappingRelations. 212 16.3 ExamplesofSynthesisfromHLParameters 215 16.4 TowardRulesforSynthesis . . . . . . . . . 217 17 BiomechanicalandPhysiologicallyBasedSpeechModeling 221 ReinerF.Wilhelms-Tricarico,JosephS. Perkell 17.1 Introduction........... 221 17.2 ArticulatorySynthesizers . . . . 222 17.3 AFiniteElementTongueModel 222 17.4 TheController 227 17.5 Conclusions........... 232 18 Analysis-SynthesisandIntelligibilityofaTalkingFace 235 BertrandLeGoff,ThierryGuiard-Marigny,ChristianBenoit 18.1 Introduction...... 235 18.2 TheParametricModels . . . . 236 18.3 VideoAnalysis . . . . . . . . 236 18.4 Real-TimeAnalysis-Synthesis 237 18.5 IntelligibilityoftheModels . 238 18.6 Conclusion . . . . . . . . . . 244 19 3DModelsoftheLipsandJawforVisualSpeechSynthesis 247 ThierryGuiard-Marigny, AliAdjoudani,ChristianBenoit 19.1 Introduction........ 247 19.2 The2DModeloftheLips 248

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.