Signals and Communication Technology Mourad Abbas Editor Analysis and Application of Natural Language and Speech Processing Signals and Communication Technology SeriesEditors Emre Celebi, Department of Computer Science, University of Central Arkansas, Conway,AR,USA JingdongChen,NorthwesternPolytechnicalUniversity,Xi’an,China E. S. Gopi, Department of Electronics and Communication Engineering, National InstituteofTechnology,Tiruchirappalli,TamilNadu,India AmyNeustein,LinguisticTechnologySystems,FortLee,NJ,USA H. Vincent Poor, Department of Electrical Engineering, Princeton University, Princeton,NJ,USA AntonioLiotta,UniversityofBolzano,Bolzano,Italy MarioDiMauro,UniversityofSalerno,Salerno,Italy This series is devoted to fundamentals and applications of modern methods of signal processing and cutting-edge communication technologies. The main topics are information and signal theory, acoustical signal processing, image processing and multimedia systems, mobile and wireless communications, and computer and communication networks. Volumes in the series address researchers in academia and industrial R&D departments. The series is application-oriented. The level of presentation of each individual volume, however, depends on the subject and can rangefrompracticaltoscientific. Indexing: All books in “Signals and Communication Technology” are indexed byScopusandzbMATH Forgeneralinformationaboutthisbookseries,commentsorsuggestions,please contact Mary James at [email protected] or Ramesh Nath Premnath at [email protected]. Mourad Abbas Editor Analysis and Application of Natural Language and Speech Processing Editor MouradAbbas HighCouncilofArabic Algiers,Algeria ISSN1860-4862 ISSN1860-4870 (electronic) SignalsandCommunicationTechnology ISBN978-3-031-11034-4 ISBN978-3-031-11035-1 (eBook) https://doi.org/10.1007/978-3-031-11035-1 ©TheEditor(s)(ifapplicable)andTheAuthor(s),underexclusivelicensetoSpringerNatureSwitzerland AG2023 Thisworkissubjecttocopyright.AllrightsaresolelyandexclusivelylicensedbythePublisher,whether thewholeorpartofthematerialisconcerned,specificallytherightsoftranslation,reprinting,reuse ofillustrations,recitation,broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,and transmissionorinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilar ordissimilarmethodologynowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthors,andtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressedorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Preface Theincreasedaccesstopowerfulprocessorshasmadepossiblesignificantprogress in natural language processing (NLP). We find more research in NLP targeting diverse spectrum of major industries that use voice recognition, text-to-speech (TTS) solutions, speech translation, natural language understanding (NLU), and manyotherapplicationsandtechniquesrelatedtotheseareas. Thisbookpresentsthelatestresearchrelatedtonaturallanguageprocessingand speechtechnologyandshedslightonthemaintopicsforreadersinterestedinthis field.ForTTSandautomaticspeechrecognition,itisdemonstratedhowtoexplore transferlearninginordertogeneratespeechinothervoicesfromTTSofaspecific language (Italian), and to improve speech recognition for non-native English. Languageresourcesarethecornerstoneforbuildinghigh-qualitysystems;however, some languages, as Arabic, are considered under-resourced compared to English. Thus, a new Arabic linguistic pipeline for NLP is presented to enrich the Arabic languageresourcesandtosolvecommonNLPissues,likewordsegmentation,POS tagging, and lemmatization. Arabic named entity recognition, a challenging task, hasbeenresolvedwithinthisbookusingtransformer-based-CRFmodel. Inaddition,thereadersofthisbookwilldiscoverconceptionsandsolutionsfor otherNLPissuessuchaslanguage modeling,questionanswering,dialogsystems, andsentenceembeddings. MouradAbbas v Contents ITAcotron2:ThePowerofTransferLearninginExpressiveTTS Synthesis .......................................................................... 1 AnnaFavaro,LiciaSbattella,RobertoTedesco,andVincenzoScotti ImprovingAutomaticSpeechRecognitionforNon-nativeEnglish withTransferLearningandLanguageModelDecoding .................... 21 PeterSullivan,ToshikoShibano,andMuhammadAbdul-Mageed KabyleASRPhonologicalErrorandNetworkAnalysis .................... 45 ChristopherHaberlandandNiLao ALP:AnArabicLinguisticPipeline ........................................... 67 AbedAlhakimFreihat,GáborBella,MouradAbbas,HamdyMubarak, andFaustoGiunchiglia Arabic Anaphora Resolution System Using New Features: PronominalandVerbalCases .................................................. 101 Abdelhalim Hafedh Dahou, Mohamed Abdelmoazz, andMohamedAmineCheragui ACommonsense-EnhancedDocument-GroundedConversational Agent:ACaseStudyonTask-BasedDialogue................................ 123 CarlStrathearnandDimitraGkatzia BloomQDE: Leveraging Bloom’s Taxonomy for Question DifficultyEstimation............................................................. 145 Sabine Ullrich, Amon Soares de Souza, Josua Köhler, andMichaelaGeierhos AComparativeStudyonLanguageModelsforDravidianLanguages.... 157 Rahul Raman, Danish Mohammed Ebadulla, Hridhay Kiran Shetty, andMamathaH.R. vii viii Contents ArabicNamedEntityRecognition withaCRFModelBased onTransformerArchitecture................................................... 169 MuhammadAl-Qurishi,RiadSouissi,andSarahAl-Qaseemi StaticFuzzyBag-of-Words:ExploringStaticUniverseMatrices forSentenceEmbeddings ....................................................... 191 MatteoMuffo,RobertoTedesco,LiciaSbattella,andVincenzoScotti Index............................................................................... 213 ITAcotron 2: The Power of Transfer Learning in Expressive TTS Synthesis AnnaFavaro ,LiciaSbattella ,RobertoTedesco ,and VincenzoScotti Abstract Atext-to-speech(TTS)synthesiserhastogenerateintelligibleandnatu- ralspeechwhilemodellinglinguisticandparalinguisticcomponentscharacterising humanvoice.Inthiswork,wepresentITAcotron2,anItalianTTSsynthesiserable to generate speech in several voices. In its development, we explored the power of transfer learning by iteratively fine-tuning an English Tacotron 2 spectrogram predictor on different Italian data sets. Moreover, we introduced a conditioning strategy to enable ITAcotron 2 to generate new speech in the voice of a variety of speakers. To do so, we examined the zero-shot behaviour of a speaker encoder architecture, previously trained to accomplish a speaker verification task with Englishspeakers,torepresentItalianspeakers’voiceprints.Weasked70volunteers to evaluate intelligibility, naturalness, and similarity between synthesised voices and real speech from target speakers. Our model achieved a MOS score of 4.15 in intelligibility, 3.32 in naturalness, and 3.45 in speaker similarity. These results showedthesuccessfuladaptationoftherefinedsystemtothenewlanguageandits abilitytosynthesisenovelspeechinthevoiceofseveralspeakers. 1 Introduction The development of text-to-speech (TTS) synthesis systems is one of the oldest problems in the natural language processing (NLP) area and has a wide variety of applications [14]. Such systems are designed to output the waveform of a voice A.Favaro((cid:2)) CenterforLanguageandSpeechProcessing(CLSP),JohnsHopkinsUniversity,Baltimore,MD, USA e-mail:[email protected] L.Sbattella·R.Tedesco·V.Scotti DipartimentodiElettronica,InformazioneeBioingegneria(DEIB),PolitecnicodiMilano, Milano,MI,Italy e-mail:[email protected];[email protected];[email protected] ©TheAuthor(s),underexclusivelicensetoSpringerNatureSwitzerlandAG2023 1 M.Abbas(ed.),AnalysisandApplicationofNaturalLanguageandSpeech Processing,SignalsandCommunicationTechnology, https://doi.org/10.1007/978-3-031-11035-1_1 2 A.Favaroetal. utteringtheinputtextstring.Inthelastyears,theintroductionofapproachesbased ondeeplearning(DL),andinparticulartheend-to-endones[11,20,22,25],ledto significantimprovements. Mostoftheevaluationscarriedoutonthesemodelsareperformedonlanguages with many available resources, like English. Thereby, it is hard to tell how good these models are and whether they are general across languages. With this work, we propose to study how these models behave with less-resourced languages, leveragingthetransferlearningapproach. In particular, we evaluated the effectiveness of transfer learning on a TTS architecture,experimentingwiththeEnglishandItalianlanguages.Thus,westarted fromtheEnglishTTSTacotron2andfine-tuneditstrainingonacollectionofItalian corpora. Then, we extended the resulting model, with speaker conditioning; the resultwasanItalianTTSwenamedITAcotron2. ITAcotron 2 was evaluated, through human assessment, on intelligibility and naturalnessofthesynthesisedaudioclips,aswellasonspeakersimilaritybetween targetanddifferentvoices.Intheend,weobtainedreasonablygoodresults,inline withthoseoftheoriginalmodel. The rest of this paper is divided into the following sections: In Sect.2, we introducetheproblem.InSect.3,wepresentsomeavailablesolutions.InSect.4,we detailtheaimofthepaperandtheexperimentalhypothesesweassumed.InSect.5, wepresentthecorporaemployedtotrainandtestourmodel.InSect.6,weexplain the structure of the synthesis pipeline we are proposing and how we adapted it to ItalianfromEnglish.InSect.7,wedescribetheexperimentalapproachwefollowed to assess the model quality. In Sect.8, we comment on the results of our model. Finally,inSect.9,wesumupourworkandsuggestpossiblefutureextensions. 2 Background Every TTS synthesiser represents an original imitation of the human reading capability, and, to be implemented, it has to cope with the technological and imaginativeconstraintscharacterisingtheperiodofitscreation. In the mid-1980s, the concomitant developments in NLP and digital signal processing (DSP) techniques broadened the applications of these systems. Their firstemploymentwasinscreenreadingsystemsforblindpeople,whereaTTSwas inchargeofreadinguserinterfacesandtextualcontents(e.g.websites,books,etc.), converting them into speech. Even though the early screen readers (e.g. JAWS1) sounded mechanical and robotic, they represented a valuable alternative for blind peopletotheusualbraillereading. SincethequalityofTTSsystemshasbeenprogressivelyenhanced,theiradoption waslaterextendedtootherpracticaldomainssuchastelecommunicationsservices, 1https://www.freedomscientific.com/products/software/jaws.