ebook img

Development of a standard Yorùbá digital text automatic diacritic restoration system PDF

262 Pages·2014·6.056 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Development of a standard Yorùbá digital text automatic diacritic restoration system

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/321475247 The development of a Standard Yorùbá Diacritics Restoration System Thesis · May 2014 DOI: 10.13140/RG.2.2.35584.12800 CITATIONS READS 6 2,946 1 author: Franklin Oladiipo Asahiah Obafemi Awolowo University 13 PUBLICATIONS   26 CITATIONS    SEE PROFILE Some of the authors of this publication are also working on these related projects: Diacritic Restoration View project Computer network utilization View project All content following this page was uploaded by Franklin Oladiipo Asahiah on 02 December 2017. The user has requested enhancement of the downloaded file. ` ´ DEVELOPMENT OF A STANDARD YORUBA DIGITAL TEXT AUTOMATIC DIACRITIC RESTORATION SYSTEM By ´ ´ ` FRANKLIN OLADIIPO ASAHIAH . . M. Sc. (Computer Science), Ife` . ATHESISSUBMITTEDTO DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING FACULTYOFTECHNOLOGY OBA´FE´MIAWO´LO´WO` UNIVERSITY,ILE´-IFE` . . . . INPARTIALFULFILMENTOFTHEREQUIREMENTS FORTHEAWARDOFDOCTOROFPHILOSOPHYIN COMPUTERSCIENCE 2014 AUTHORISATION TO COPY OBA´FE´MIAWO´LO´WO` UNIVERSITY,ILE´-IFE`, . . . . HEZEKIAHOLU´WA´SANMI´ LIBRARY POSTGRADUATETHESIS AUTHORISATIONTOCOPY Author: FranklinO. la´di´ıpo`. ASAHIAH Title: DevelopmentofaStandardYoru`ba´ DigitalTextAutomatic DiacriticRestorationSystem Degree: Ph.D.(ComputerScience) Year: 2014 I, Franklin O. la´di´ıpo`. ASAHIAH, hereby authorise the Hezekiah Olu´wa´sanm´ı Librarytocopymythesisinpartorwholeinresponsetorequestfromindivid- ualsandororganisationsforthepurposeofprivatestudyorresearch. / / 22 02 2014 SignatureofAuthorandDate ii CERTIFICATION Theundersignedherebycertifythatthisisanoriginalresearchcarriedoutby / / / FranklinO. la´di´ıpo`. ASAHIAHwiththeregistrationnumberTCP05 06 H 0449in the Department of Computer Science and Engineering, Faculty of Technology, Oba´fe´.miAwo´lo´.wo`. Universityundermysupervision. Supervisor: Dr. O.A.O.de´.jo.b´ı Co-Supervisor: ProfessorE.R.Ada´gu´nodo` HeadofDepartment: Dr. H.A.So´r´ıya`n iii To Jesus Christ: The Author and Finisher of our Faith ( ) andupholdsallthingsbythewordofhispower iv ACKNOWLEDGEMENT Blessed be the God and Father of our Lord Jesus Christ! He is the Father of mercies and the God of all comfort who saw to the starting of this work, the provisionforthisworkandthecompletionofit. Iamalsogratefultomysupervisoryteam: mysupervisor,Dr. O.A.O. de´.jo.b´ı and my co-supervisor, Prof. E. R. Ada´gu´nodo` who honed my skill in research andpolishedthisworktobecomeaPh.Dmaterial. Thediscussions,corrections and suggestions that I received were highly valued. I also thank members of the panel of examiners of my qualifying examination whose suggestions and criticalevaluationmademetoimproveuponthequalityofthiswork. IwanttothankthecurrentandpreviousHeadsofDepartmentofComputer Science and Engineering who have played various roles in the course of this research work. In the same vein, I want to thank all my colleagues in the departmentwhoencouraged,”pushed”and”harassed”mesothatthisstageof ff life and career will be over. I also thank the administrative sta in the general ffi o ce who worked diligently with the processing of forms. I want to thank the management of the Obafemi Awolowo University, Ile-Ife who sponsored this doctoralresearchprogramme. ff I also want to appreciate the following people who played di erent roles when the completion of the programmes was in jeopardy: Dr. G. Ale´b´ıowu´, Prof. M. O. I`lo`r´ı, Prof. G. A Ade´ro´unmu`, Prof(Mrs). K. A. Ta´´ıwo`, Prof C. A`ka`nb´ı, Dr. J. A. So´.niba´re´.. Mr. and Mrs. A`yo´. O`gu´nru`ku` were also great sourcesofencouragementandprayersupport. Furthermore,Iwanttothankmembersofmyhouseholdoffaith: Daddy(Dr.) JoeAlla,membersoftheCoregroup,andbrethrenoftheCharismaticRenewal Ministries in Rehoboth Sanctuary, Ile-Ife, in particular and all over the world. I appreciate your prayers all the way through this PhD. To my all my brethren v in God Is Love CSF and friends, thank you for constantly lifting up hands in prayer. To members of my families: to my mother, thank you for encouraging daily calls and persistently believing that God is able; to my siblings and their families: Mr. and Dr.(Mrs.) Olu´bo`.de´ Sa`we`.; Mr. and Mrs. Olu´mˇide´ Kumo.lalo.; Mr. and Mrs. Te`m´ıto´.pe´. Da`da; Dr. and Mrs. O. la´n´rewa´ju´ Kumo.lalo.; Mr. and Dr.(Mrs) O. la´ba´nj´ı Kumo.lalo.; Mr. and Mrs A`la`ba´ Kumolalo; Mr. and Mrs. O. la´le´kan A`ja`y´ı, thanks for your constant encouragement and prayers. To Chief(Mrs.) Agoye, Mayowa, Tunrayo, and Fisayo Agoye, thank you for your encouragement, prophetic declaration and prayers. May the God of heavens rewardyouabundantly. TomychildrenandtheboardofownershipofAsahiah: I`n´ıolu´wa,E`.m´ıolu´wa andA`ra`olu´wa,thankyouforthetimeyouhadtostayseveralhoursafterschool ffi and on weekend with me in the o ce so that this work can go on and be completed. Much more, I thank you for the sacrifice of the time you needed to enjoy Daddy’s company that he was not available. Above all, board of ownership, thank you for the special prayers especially: ”God, please help daddytofinishhisPh.D!” Lastly, Ade´.ke´.miso´.la´, my wife: The grace of God through your faith saw this Ph.D through. Your prayers and fasting were not in vain. Your saw opportunitieswhereIhadgivenup. Thesleeplessnightsandsittingbymyside ff to encourage me while I tried to push further in the work has finally paid o . Godusedyouasanangelformeinboththisworkandotherareas. Thankyou forbelievinginmeandinthiswork. vi TABLE OF CONTENTS Page Certification iii Acknowledgement v TableofContents vii ListofTables x ListofFigures xi Abstract xiii CHAPTER ONE:INTRODUCTION 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 StatementoftheProblem . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Justification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4.1 AimofthisResearch . . . . . . . . . . . . . . . . . . . . . . 5 1.4.2 ResearchObjectives . . . . . . . . . . . . . . . . . . . . . . 5 1.5 ResearchMethodology . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.6 ScopeofResearch . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.7 ResearchTheory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.8 ResearchContext . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.9 ContributiontoKnowledge . . . . . . . . . . . . . . . . . . . . . . 10 1.10 OrganisationofThesis . . . . . . . . . . . . . . . . . . . . . . . . . 10 CHAPTER TWO : RESEARCH BACKGROUND AND LITERATURE REVIEW 11 2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 DevelopmentofYoru`ba´ Orthography . . . . . . . . . . . . . . . . 11 2.2.1 Pre-StandardizationPeriodofYoru`ba´ Orthography . . . . 14 2.2.2 StandardizationPeriodofYoru`ba´ Orthography . . . . . . 15 2.2.3 StandardYoru`ba´ Orthography . . . . . . . . . . . . . . . . 19 2.3 Yoru`ba´ anditsWritingSystem . . . . . . . . . . . . . . . . . . . . 25 2.3.1 DescriptionofYoru`ba´ DigitalText . . . . . . . . . . . . . . 28 2.3.2 Diacritics: UsesandImportance . . . . . . . . . . . . . . . 29 2.3.3 DiacriticRestoration . . . . . . . . . . . . . . . . . . . . . . 32 2.3.4 TheDiacriticRestorationProblem . . . . . . . . . . . . . . 34 2.4 ReviewofDiacriticRestoration . . . . . . . . . . . . . . . . . . . . 35 ff 2.4.1 Abjad-basedlanguagesdiacriticrestoratione orts . . . . . 36 ff 2.4.2 Alphabeticlanguageswithdiacriticrestoratione orts . . 37 ff 2.4.3 ToneLanguageswithDiacriticRestorationE orts . . . . . 37 2.4.4 DiacriticRestorationinVariousLanguages . . . . . . . . . 38 2.4.5 LinguisticToolsAppliedtoDiacriticRestoration . . . . . . 41 2.5 ApproachestoDiacriticRestoration . . . . . . . . . . . . . . . . . 43 2.5.1 RuleBasedDiacriticRestoration . . . . . . . . . . . . . . . 44 vii 2.6 StatisticalDiacriticRestoration . . . . . . . . . . . . . . . . . . . . 48 2.6.1 Modelsrelyingongenerationofcandidatediacriticforms 49 2.6.2 Modelsthatrelyonprobabilistictagging . . . . . . . . . . 52 2.6.3 InstanceBasedLearningModels . . . . . . . . . . . . . . . 55 2.6.4 GraphicalModels . . . . . . . . . . . . . . . . . . . . . . . . 59 2.6.5 BayesianClassifierBasedModel . . . . . . . . . . . . . . . 63 2.6.6 HMMBasedModels . . . . . . . . . . . . . . . . . . . . . . 66 2.6.7 MaximumEntropyMarkovModel(MEMM) . . . . . . . . 71 2.6.8 ConditionalRandomField(CRF) . . . . . . . . . . . . . . . 74 2.6.9 SupplementaryModels . . . . . . . . . . . . . . . . . . . . 78 2.7 TokensUsedforRestoration . . . . . . . . . . . . . . . . . . . . . . 80 2.8 LanguageModelling . . . . . . . . . . . . . . . . . . . . . . . . . . 82 2.8.1 Rule-BasedLanguageModels . . . . . . . . . . . . . . . . . 83 2.8.2 ProbabilisticLanguageModels . . . . . . . . . . . . . . . . 83 2.9 ModelsAppliedtoYoru`ba´ Language . . . . . . . . . . . . . . . . . 85 2.9.1 NoisyChannelModel . . . . . . . . . . . . . . . . . . . . . 87 2.10 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 2.11 SummaryofReview . . . . . . . . . . . . . . . . . . . . . . . . . . 92 2.12 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 CHAPTER THREE:MODELFORMULATION 94 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 3.2 ProblemFormulation . . . . . . . . . . . . . . . . . . . . . . . . . . 94 3.2.1 DiacriticRestorationasCorrectionSystem . . . . . . . . . 97 3.2.2 ModellingBackground . . . . . . . . . . . . . . . . . . . . . 97 3.2.3 ModelDesign . . . . . . . . . . . . . . . . . . . . . . . . . . 102 3.2.4 ModelProcesses . . . . . . . . . . . . . . . . . . . . . . . . 103 3.2.5 ModelDescription . . . . . . . . . . . . . . . . . . . . . . . 104 3.3 ModellingoftheSYTextDiacriticRestoration . . . . . . . . . . . 112 3.3.1 DescriptionofStandardYoru`ba´DiacriticRestorationModel (SYRM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 / 3.3.2 Modelling of Dot-Below Tone-Marks Restoration Using CRF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 / 3.3.3 ModellingDot-Below Tone-MarksRestorationUsingMBL 116 3.4 SoftwareSystemDesignforModels . . . . . . . . . . . . . . . . . 118 CHAPTER FOUR:MODELIMPLEMENTATION 123 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 4.2 ModelImplementation . . . . . . . . . . . . . . . . . . . . . . . . . 123 4.2.1 SoftwareToolsandTool-kitsUtilized . . . . . . . . . . . . 124 4.2.2 ImplementationEnvironment . . . . . . . . . . . . . . . . . 125 4.2.3 ImplementationDetails . . . . . . . . . . . . . . . . . . . . 126 ffl 4.3 BuildingtheO ineStatisticalSub-Models . . . . . . . . . . . . . 126 4.3.1 ExperimentalSetup . . . . . . . . . . . . . . . . . . . . . . . 127 4.4 BuildingDot-BelowStatisticalModels . . . . . . . . . . . . . . . . 128 4.4.1 Dot-Below,CharacterandTiMBLbasedStatisticalModel . 128 4.4.2 Dot-belowSyllable-basedStatisticalModelusingTiMBL . 133 4.4.3 Dot-BelowCharacter-basedStatisticalModelusingCRF . 134 4.4.4 Dot-BelowSyllable-basedStatisticalModelusingCRF . . 136 4.5 BuildingTone-MarksStatisticalModels . . . . . . . . . . . . . . . 136 viii 4.5.1 Tone-markSyllable-basedStatisticalModelusingTiMBL . 136 4.5.2 Tone-markSyllable-basedStatisticalModelusingCRF . . 137 4.5.3 Tone-markSyllableplus-basedStatisticalModelusingCRF . 138 4.6 AlternateConfigurationsofStatisticalModels . . . . . . . . . . . 138 4.6.1 AlternateTone-MarksStatisticalModelusingCRF . . . . . 138 4.6.2 AlternateDot-BelowStatisticalModelusingMBL . . . . . 140 4.6.3 AlternateDot-BelowStatisticalModelusingCRF . . . . . 140 4.6.4 Post-ProcessorforStatisticalModels . . . . . . . . . . . . . 141 4.6.5 Dot-BelowRule-basedModelforPost-Processing . . . . . 141 4.6.6 Tone-MarksRule-basedModelforPost-Processing . . . . 141 4.7 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 4.7.1 Datagathering . . . . . . . . . . . . . . . . . . . . . . . . . 143 4.7.2 DataNormalization . . . . . . . . . . . . . . . . . . . . . . 146 4.7.3 TextDataCreation . . . . . . . . . . . . . . . . . . . . . . . 150 4.7.4 CompositionofTextualData . . . . . . . . . . . . . . . . . 153 4.7.5 DistributionofTokensintheTextData . . . . . . . . . . . 156 4.8 ConstraintsonDiacriticsOccurrenceinSYtext . . . . . . . . . . . 160 4.8.1 PhonologicalConstraintsonToneOccurrencewithinSY . 160 4.8.2 PhonologicalConstraintsonPhonemicSequencewithinSY160 4.9 MeasurementParameters . . . . . . . . . . . . . . . . . . . . . . . 164 4.10 SYAutomaticDiacriticRestorationWorkingModel . . . . . . . . 166 CHAPTER FIVE:RESULTSANDDISCUSSION 168 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 5.2.1 ResultofDot-BelowWorkingSub-Model . . . . . . . . . . 169 5.2.2 DiscussionofResultofDot-BelowRestorationSub-Model 170 ff 5.2.3 E ectofTokenTypeonDot-BelowRestorationAccuracy . 176 ff 5.2.4 E ectofAlgorithmonDot-BelowRestorationAccuracy . 181 5.2.5 Dot-BelowRestorationOptimalConfiguration . . . . . . . 185 5.2.6 ResultofTone-MarksWorkingSub-Model . . . . . . . . . 187 5.2.7 DiscussionofResultsofTone-MarksRestoration . . . . . . 187 ff 5.2.8 E ectofTokenTypeonTone-MarksRestorationAccuracy 192 ff 5.2.9 E ectofAlgorithmonTone-MarkRestorationAccuracy . 196 5.2.10 ResultofErroratWord-Level . . . . . . . . . . . . . . . . . 197 5.2.11 DiscussionofResults . . . . . . . . . . . . . . . . . . . . . . 200 5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 CHAPTER SIX:CONCLUSIONANDRECOMMENDATION 204 6.1 SummaryofThesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 6.2 Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 6.3 SuggestionforFurtherResearch . . . . . . . . . . . . . . . . . . . 205 6.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 REFERENCES 207 APPENDICES 218 APPENDIXA:APPENDIXSNAPSHOTOFWORKINGMODEL 218 APPENDIXB:APPENDIXWord-LevelEvaluationText 228 APPENDIXC:APPENDIXProgramListing 229 ix

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.