ebook img

Source Modeling Techniques for Quality Enhancement in Statistical Parametric Speech Synthesis PDF

145 Pages·2019·11.016 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Source Modeling Techniques for Quality Enhancement in Statistical Parametric Speech Synthesis

SPRINGER BRIEFS IN ELECTRICAL AND COMPUTER ENGINEERING  SPEECH TECHNOLOGY K. Sreenivasa Rao N. P. Narendra Source Modeling Techniques for Quality Enhancement in Statistical Parametric Speech Synthesis 123 SpringerBriefs in Speech Technology Studies in Speech Signal Processing, Natural Language Understanding, and Machine Learning SeriesEditor: AmyNeustein SpringerBriefs present concise summaries of cutting-edge research and practical applications across a wide spectrum of fields. Featuring compact volumes of 50 to 125 pages, the series covers a range of content from professional to academic. Typicaltopicsmightinclude: • Atimelyreportofstate-of-theartanalyticaltechniques • Abridgebetweennewresearchresults,aspublishedinjournalarticles,anda contextualliteraturereview • Asnapshotofahotoremergingtopic • Anin-depthcasestudyorclinicalexample • Apresentationofcoreconceptsthatstudentsmustunderstandinordertomake independentcontributions Briefsarecharacterizedbyfast,globalelectronicdissemination,standardpublishing contracts, standardized manuscript preparation and formatting guidelines, and expeditedproductionschedules. The goal of the SpringerBriefs in Speech Technology series is to serve as an important reference guide for speech developers, system designers, speech engineers and other professionals in academia, government and the private sector. To accomplish this task, the series will showcase the latest findings in speech technology, ranging from a comparative analysis of contemporary methods of speech parameterization to recent advances in commercial deployment of spoken dialogsystems. Moreinformationaboutthisseriesathttp://www.springer.com/series/10043 K. Sreenivasa Rao • N. P. Narendra Source Modeling Techniques for Quality Enhancement in Statistical Parametric Speech Synthesis 123 K.SreenivasaRao N.P.Narendra DepartmentofComputerScienceand AaltoUniversity Engineering Espoo,Finland IndianInstituteofTechnologyKharagpur Kharagpur,WestBengal,India ISSN2191-737X ISSN2191-7388 (electronic) SpringerBriefsinSpeechTechnology ISBN978-3-030-02758-2 ISBN978-3-030-02759-9 (eBook) https://doi.org/10.1007/978-3-030-02759-9 LibraryofCongressControlNumber:2018959748 ©TheAuthor(s),underexclusivelicencetoSpringerNatureSwitzerlandAG2019 Thisworkissubjecttocopyright.AllrightsaresolelyandexclusivelylicensedbythePublisher,whether thewholeorpartofthematerialisconcerned,specificallytherightsoftranslation,reprinting,reuse ofillustrations,recitation,broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,and transmissionorinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilar ordissimilarmethodologynowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Preface Speechisthemostnaturalwayforhumanstocommunicatewitheachother.Synthe- sis of artificial human speech provides efficient human-computer communication. Nowadays, the speech synthesis systems are widely used in various applications such as screen readers for visually challenged people, speech interface for mobile devices,navigation,andpersonalguidancegadgets.Ashumansareverysensitivein perceiving even the slightest distortions in the speech signal, speech synthesizers with suboptimal quality make them unfit for usage in commercial applications. The main goal of this book is to improve the quality of statistical parametric speech synthesis (SPSS) by efficiently modeling the source or excitation signal. The excitation signal used in synthesis should preserve all natural variations so that the synthesized speech is close to natural quality. The work presented in this bookconfinesitsscopetothe(1)accurateestimationofpitch(F )and(2)precise 0 modeling of excitation signal. For modeling the excitation signal, both parametric andhybridapproachesareexplored.Inthiswork,creakyvoicehasbeensynthesized atappropriateplacesbyproposingappropriatemethodsandmodels. Thecontentsofthebookareusefulforbothresearchersandsystemdevelopers. For researchers, the book will be useful for knowing the current state-of-the- art excitation source models for SPSS and further refining the source models to incorporate the realistic semantics present in the text. For system developers, the book will be useful to integrate the sophisticated excitation source models mentioned in the book to the latest models of mobile/smart phones. The book has beenorganizedasfollows: • Chapter 1 introduces the topic of text-to-speech synthesis. Different speech synthesisapproachesarebrieflydiscussed. • Chapter2providesareviewofthestate-of-the-artmethodsforF0estimationand parametricandhybridsourcemodelingapproaches. • Chapter3discussesthedesignofavoicingdetectionandF estimationmethod 0 by adaptively choosing appropriate window size for zero-frequency filtering method. • Chapter4presentstwoparametricsourcemodelingmethods. v vi Preface • Chapter 5 describes two proposed hybrid methods of modeling the excitation signal. • Chapter 6 deals with the generation of creaky voice by addressing two main issues, namely, automatic detection of creaky voice and source modeling of creakyvoice. • Chapter7summarizesthecontributionsofthebookalongwithsomeimportant conclusions. Directions toward the scope for possible future work are also discussed. WewouldespeciallyliketothankallprofessorsoftheDepartmentofComputer Science and Engineering, IIT Kharagpur, for their consistent support during the course of editing and organization of the book. Special thanks to our colleagues at Indian Institute of Technology, Kharagpur, India, for their cooperation to carry outthework.Wearegratefultoourparentsandfamilymembersfortheirconstant supportandencouragement.Finally,wethankallourfriendsandwell-wishers. Kharagpur,India K.SreenivasaRao Espoo,Finland N.P.Narendra Contents 1 Introduction .................................................................. 1 1.1 Introduction ............................................................. 1 1.2 SpeechSynthesisMethods ............................................. 2 1.2.1 FormantSynthesis.............................................. 2 1.2.2 ArticulatorySynthesis.......................................... 3 1.2.3 ConcatenativeSynthesis........................................ 3 1.2.4 StatisticalParametricSpeechSynthesis ....................... 4 1.2.5 HybridSynthesisMethods ..................................... 5 1.3 ObjectivesandScopeoftheWork ..................................... 5 1.4 ContributionsoftheBook.............................................. 6 1.4.1 RobustVoicingDetectionandF EstimationMethod........ 6 0 1.4.2 ParametricApproachofModelingtheExcitationSignal..... 7 1.4.3 HybridApproachofModelingtheExcitationSignal......... 7 1.4.4 GenerationofCreakyVoice.................................... 7 1.5 OrganizationoftheBook............................................... 8 References..................................................................... 9 2 BackgroundandLiteratureReview ....................................... 11 2.1 HMM-BasedSpeechSynthesis......................................... 11 2.1.1 HiddenMarkovModel ......................................... 12 2.1.2 SystemOverview............................................... 14 2.1.3 DurationModeling.............................................. 15 2.1.4 DecisionTree-BasedContextClustering...................... 15 2.1.5 Synthesis ........................................................ 16 2.2 VoicingDetectionandF Estimation:AReview...................... 16 0 2.3 SourceModelingApproaches:AReview.............................. 19 2.4 GenerationofCreakyVoice:AReview................................ 22 2.5 Summary ................................................................ 24 References..................................................................... 25 vii viii Contents 3 RobustVoicingDetectionandF EstimationMethod................... 29 0 3.1 F ModelingandGenerationinHTS .................................. 29 0 3.2 ProposedMethodforVoicingDetectionandF Estimation.......... 30 0 3.2.1 Zero-FrequencyFilteringMethodforDetectingthe InstantsofSignificantExcitation .............................. 31 3.2.2 VoicingDetection............................................... 32 3.2.3 InfluenceofWindowSizeontheStrengthofExcitation ..... 34 3.2.4 F Estimation................................................... 38 0 3.2.5 PerformanceEvaluation........................................ 40 3.3 ImplementationoftheProposedVoicingDetectionandF 0 ExtractioninHTSFramework ......................................... 43 3.4 Evaluation ............................................................... 45 3.4.1 EvaluationofVoicingDetection............................... 46 3.4.2 SubjectiveEvaluation........................................... 48 3.5 Summary ................................................................ 50 References..................................................................... 51 4 ParametricApproachofModelingtheSourceSignal ................... 53 4.1 Parametric Source Modeling Method Based on Principal ComponentAnalysis.................................................... 53 4.1.1 GenerationofPitch-SynchronousResidualFrames........... 53 4.1.2 ParameterizationofResidualFrameUsingPCA.............. 55 4.1.3 SpeechSynthesisUsingtheProposedPCA-Based ParametricSourceModel....................................... 57 4.1.4 Evaluation....................................................... 58 4.2 Parametric Source Modeling Method Based on the DeterministicandNoiseComponentsofResidualFrames ........... 60 4.2.1 AnalysisofCharacteristicsofResidualFrames............... 60 4.2.2 OverviewofProposedParametricSourceModel............. 62 4.2.3 ParameterizationofDeterministicComponent................ 63 4.2.4 ParameterizationofNoiseComponent ........................ 64 4.2.5 SpeechSynthesisUsingtheProposedDeterministic andNoiseComponent-BasedParametricSourceModel ..... 66 4.2.6 Evaluation....................................................... 67 4.3 Summary ................................................................ 71 References..................................................................... 73 5 HybridApproachofModelingtheSourceSignal........................ 75 5.1 OptimalResidualFrame-BasedHybridSourceModelingMethod... 75 5.1.1 ComputationofOptimalResidualFrameforaPhone........ 75 5.1.2 ClusteringtheOptimalResidualFrames ...................... 77 5.1.3 Speech Synthesis Using the Proposed Optimal ResidualFrame-BasedHybridSourceModel................. 78 5.1.4 Evaluation....................................................... 80 Contents ix 5.2 Time-DomainDeterministicPlusNoiseModel-BasedHybrid SourceModel............................................................ 82 5.2.1 DecompositionofDeterministicandNoiseComponents..... 88 5.2.2 ClusteringtheDeterministicComponents..................... 88 5.2.3 StoringNaturalInstanceofNoiseSignalAlongwith theDeterministicComponent.................................. 89 5.2.4 ParameterizingtheNoiseComponents........................ 90 5.2.5 SpeechSynthesisUsingtheProposedTime-Domain DeterministicPlusNoiseModel-BasedHybridSource Model............................................................ 91 5.2.6 Evaluation....................................................... 93 5.2.7 Discussion....................................................... 100 5.3 Summary ................................................................ 102 References..................................................................... 103 6 GenerationofCreakyVoice................................................. 105 6.1 HMM-BasedSpeechSynthesisSystemforGeneratingModal andCreakyVoices ...................................................... 105 6.2 AutomaticDetectionofCreakyVoice ................................. 107 6.2.1 AnalysisofEpochParameters ................................. 108 6.2.2 ComputationofVarianceofEpochParameters ............... 110 6.2.3 ClassificationUsingVarianceofEpochParameters .......... 111 6.2.4 PerformanceEvaluation........................................ 111 6.3 HybridSourceModelforGeneratingCreakyExcitation ............. 115 6.3.1 GenerationofCreakyResidualFrames........................ 117 6.3.2 DeterministicPlusNoiseDecompositionforEvery PhoneticClass .................................................. 118 6.3.3 SynthesisofCreakyVoiceUsingProposedHybrid SourceModel ................................................... 119 6.3.4 Evaluation....................................................... 120 6.3.5 Discussion....................................................... 122 6.4 Summary ................................................................ 123 References..................................................................... 123 7 SummaryandConclusions ................................................. 125 7.1 SummaryoftheBook................................................... 125 7.2 ContributionsoftheBook.............................................. 127 7.3 DirectionsforFutureWork............................................. 128 References..................................................................... 128 Index............................................................................... 131

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.