Table Of Content

Signals and Communication Technology Asoke Kumar Datta Epoch Synchronous Overlap Add (ESOLA) A Concatenative Synthesis Procedure for Speech Signals and Communication Technology More information about this series at http://www.springer.com/series/4748 Asoke Kumar Datta Epoch Synchronous Overlap Add (ESOLA) A Concatenative Synthesis Procedure for Speech 123 Asoke KumarDatta Society for Natural LanguageTechnology Research(SNLTR) Kolkata, West Bengal India ISSN 1860-4862 ISSN 1860-4870 (electronic) Signals andCommunication Technology ISBN978-981-10-7015-0 ISBN978-981-10-7016-7 (eBook) https://doi.org/10.1007/978-981-10-7016-7 LibraryofCongressControlNumber:2017956315 ©SpringerNatureSingaporePteLtd.2018 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpart of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission orinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar methodologynowknownorhereafterdeveloped. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfrom therelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authorsortheeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinor for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictionalclaimsinpublishedmapsandinstitutionalaffiliations. Printedonacid-freepaper ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerNatureSingaporePteLtd. Theregisteredcompanyaddressis:152BeachRoad,#21-01/04GatewayEast,Singapore189721,Singapore This book is dedicated to my departed revered mother Shantilata Datta. May she be ever happy. Contents 1 Introduction to ESOLA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Review of Speech Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Methods and Algorithms of Speech Synthesis . . . . . . . . . . . . . . . 5 1.2.1 Articulatory Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.2 Formant Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.3 Linear Prediction Based Methods . . . . . . . . . . . . . . . . . . 9 1.2.4 Sinusoidal Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2.5 Sinusoidal Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2.6 Sinusoidal Synthesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2.7 Concatenative Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.2.8 PSOLA Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.2.9 Other Techniques for Synthesis. . . . . . . . . . . . . . . . . . . . 16 1.3 Introduction to ESOLA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.4 Organisation of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2 Epoch Synchronous Overlap Add (Esola) Algorithm . . . . . . . . . . . . 25 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2 Basic Principles of ESOLA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.2.1 Partneme: Sub-Phonemic Signal Inventory . . . . . . . . . . . 29 2.3 Structure of Esola. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.3.1 Signal Units Representation . . . . . . . . . . . . . . . . . . . . . . 37 2.3.2 Word Number Bus: Word Segmentation . . . . . . . . . . . . . 38 2.3.3 Syllable Number Bus: Syllable Breaking Algorithm. . . . . 38 2.3.4 Special Emphasis Bus . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.3.5 Textual Language Processing (TLP) Unit . . . . . . . . . . . . 39 2.4 Speech Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.4.1 Epoch Synchronous Overlap Add (ESOLA) Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.4.2 Epoch Points for Voiced Speech Signals and Perceptual Pitch Period (PPP). . . . . . . . . . . . . . . . . . 40 vii viii Contents 2.4.3 ESOLA Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.4.4 Monotonic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 2.4.5 Properties Related to Peak . . . . . . . . . . . . . . . . . . . . . . . 51 2.4.6 Properties Related to Valley . . . . . . . . . . . . . . . . . . . . . . 51 2.4.7 Pitch Modification Using Extended Bell Function . . . . . . 53 2.5 Preparation of Signal Dictionary . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.5.1 Recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 2.5.2 Pitch Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 2.5.3 Amplitude Normalization . . . . . . . . . . . . . . . . . . . . . . . . 58 2.5.4 Complexity Matching: Regeneration of signal . . . . . . . . . 59 2.6 Synthesis Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 2.6.1 Rules for Token Generation . . . . . . . . . . . . . . . . . . . . . . 62 2.6.2 Synthesis Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 2.6.3 Signal Processing Aspects . . . . . . . . . . . . . . . . . . . . . . . 63 2.7 Esola and Other Concatenative Approaches . . . . . . . . . . . . . . . . . 65 2.8 Conclusions and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3 State Phase Analysis: PDA/VDA Algorithm . . . . . . . . . . . . . . . . . . . 71 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.2 State Phase Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.2.1 Pseudo Phonemic Labeling. . . . . . . . . . . . . . . . . . . . . . . 79 3.2.2 Parameter Definitions. . . . . . . . . . . . . . . . . . . . . . . . . . . 81 3.3 Classificatory Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.4 Pitch Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.4.1 Classification Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . 87 3.4.2 Experimental Details . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.5 Comparative Assessment of Pitch Extraction . . . . . . . . . . . . . . . . 90 3.5.1 Comparison of Pitch Data Obtained by State-phase Method. . . . . . . . . . . . . . . . . . . . . . . . . . 91 3.6 Classification Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 3.7 Analysis-Resynthesis Using State Phase Method . . . . . . . . . . . . . 99 3.7.1 Extraction of Signal Elements. . . . . . . . . . . . . . . . . . . . . 99 3.7.2 Extraction of Elements in Voiced Region . . . . . . . . . . . . 100 3.7.3 Extraction of Elements in Unvoiced Regions. . . . . . . . . . 101 3.7.4 Coding for Data Packet . . . . . . . . . . . . . . . . . . . . . . . . . 101 3.7.5 Error Detection and Correction . . . . . . . . . . . . . . . . . . . . 103 3.7.6 Resynthesis Using Linear Interpolation . . . . . . . . . . . . . . 103 3.7.7 Decoding and Regeneration . . . . . . . . . . . . . . . . . . . . . . 105 3.7.8 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 3.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Contents ix 4 Phonological Rules for TTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.2 Historical Background of SCB Phonology . . . . . . . . . . . . . . . . . . 115 4.3 Phones and Phonology of SCB . . . . . . . . . . . . . . . . . . . . . . . . . . 116 4.3.1 Compilation of the Phonological Rules for Bengali . . . . . 119 4.3.2 Rule for এ (E). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 4.3.3 Rules for জ্ঞ (= J+N1). . . . . . . . . . . . . . . . . . . . . . . . . 121 4.3.4 Rules for (Y-Ligature). . . . . . . . . . . . . . . . . . . . . . . . . 121 4.3.5 Rules for (B-Ligature). . . . . . . . . . . . . . . . . . . . . . . . . 122 4.3.6 Rules for (M-Ligature) . . . . . . . . . . . . . . . . . . . . . . . . 122 4.3.7 Rule for (R-Ligature). . . . . . . . . . . . . . . . . . . . . . . . . 122 4.3.8 Rule for ম (M) and ন (N). . . . . . . . . . . . . . . . . . . . . . . . 122 4.3.9 Rules for শ (SH), ষ (S1) and স (S). . . . . . . . . . . . . . . . . 123 4.3.10 Rule for Chandra Bindu ( ) . . . . . . . . . . . . . . . . . . . . . . 123 4.4 Architecture for G2P Conversion System. . . . . . . . . . . . . . . . . . . 123 4.4.1 Structure of RDB Table . . . . . . . . . . . . . . . . . . . . . . . . . 125 4.4.2 Generation of Forest from RDB Table . . . . . . . . . . . . . . 125 4.5 Software Implementation of Phonological Rules. . . . . . . . . . . . . . 128 4.6 Conclusions and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Appendix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5 Intonation Rules for Text Reading . . . . . . . . . . . . . . . . . . . . . . . . . . 135 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 5.2 Simplification of Pitch Movement . . . . . . . . . . . . . . . . . . . . . . . . 139 5.3 Stylization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 5.4 Perceptual Evaluation of Syllabic Stylization . . . . . . . . . . . . . . . . 144 5.4.1 F Modification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 0 5.5 Perception Test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 5.5.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 5.5.2 Intonation Patterns for SCB . . . . . . . . . . . . . . . . . . . . . . 153 5.6 Method of Application in Synthesis. . . . . . . . . . . . . . . . . . . . . . . 159 5.6.1 Finding of Word Intonation Pattern. . . . . . . . . . . . . . . . . 160 5.6.2 Finding of Syllabic Intonation Pattern . . . . . . . . . . . . . . . 164 5.6.3 Synthesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 5.7 Prosody . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 5.7.1 Duration Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 5.8 Conclusions and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 6 Shimmer, Jitter and Complexity Perturbation . . . . . . . . . . . . . . . . . 177 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 6.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 x Contents 6.2.1 Glottal Cycle Detection . . . . . . . . . . . . . . . . . . . . . . . . . 181 6.2.2 Relative Jitter and Shimmer . . . . . . . . . . . . . . . . . . . . . . 181 6.2.3 Complexity Perturbation (CP). . . . . . . . . . . . . . . . . . . . . 182 6.3 Experimental Procedures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 6.3.1 Results and Discussion on Obtained Values. . . . . . . . . . . 183 6.4 Listening Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 6.5 Conclusions and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Appendix. .... .... .... .... ..... .... .... .... .... .... ..... .... 193 Epilogue.. .... .... .... .... ..... .... .... .... .... .... ..... .... 197

Description:

This book presents details of a text-to-speech synthesis procedure using epoch synchronous overlap add (ESOLA), and provides a solution for development of a text-to-speech system using minimum data resources compared to existing solutions. It also examines most natural speech signals including rando

Epoch Synchronous Overlap Add (ESOLA): A Concatenative Synthesis Procedure for Speech PDF

206 Pages·2018·7.09 MB·English

by Asoke Kumar Datta (auth.)

Checking for file health...

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Download Epoch Synchronous Overlap Add (ESOLA): A Concatenative Synthesis Procedure for Speech PDF Free - Full Version

by Asoke Kumar Datta (auth.)| 2018| 206 pages| 7.09| English

Download Epoch Synchronous Overlap Add (ESOLA): A Concatenative Synthesis Procedure for Speech by Asoke Kumar Datta (auth.) in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Epoch Synchronous Overlap Add (ESOLA): A Concatenative Synthesis Procedure for Speech

Detailed Information

Author:	Asoke Kumar Datta (auth.)
Publication Year:	2018
Pages:	206
Language:	English
File Size:	7.09
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Epoch Synchronous Overlap Add (ESOLA): A Concatenative Synthesis Procedure for Speech Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Epoch Synchronous Overlap Add (ESOLA): A Concatenative Synthesis Procedure for Speech PDF?

Yes, on https://PDFdrive.to you can download Epoch Synchronous Overlap Add (ESOLA): A Concatenative Synthesis Procedure for Speech by Asoke Kumar Datta (auth.) completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Epoch Synchronous Overlap Add (ESOLA): A Concatenative Synthesis Procedure for Speech on my mobile device?

After downloading Epoch Synchronous Overlap Add (ESOLA): A Concatenative Synthesis Procedure for Speech PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Epoch Synchronous Overlap Add (ESOLA): A Concatenative Synthesis Procedure for Speech?

Yes, this is the complete PDF version of Epoch Synchronous Overlap Add (ESOLA): A Concatenative Synthesis Procedure for Speech by Asoke Kumar Datta (auth.). You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Epoch Synchronous Overlap Add (ESOLA): A Concatenative Synthesis Procedure for Speech PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.