ebook img

Automatic Summarization For Amharic Text Using Open Text Summarizer PDF

159 Pages·2013·2.94 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Automatic Summarization For Amharic Text Using Open Text Summarizer

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES COLLEGE OF NATURAL SCIENCE SCHOOL OF INFORMATION SCIENCE AUTOMATIC SUMMARIZATION FOR AMHARIC TEXT USING OPEN TEXT SUMMARIZER ADDIS ASHAGRE TEKLEWOLD JUNE, 2013 ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES COLLEGE OF NATURAL SCIENCE SCHOOL OF INFORMATION SCIENCE AUTOMATIC SUMMARIZATION FOR AMHARIC TEXT USING OPEN TEXT SUMMARIZER A Thesis Submitted to the School of Graduate Studies of Addis Ababa University in Partial Fulfillment of the Requirements for the Degree of Master of Science in Information Science By ADDIS ASHAGRE TEKLEWOLD JUNE, 2013 ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES COLLEGE OF NATURAL SCIENCE SCHOOL OF INFORMATION SCIENCE AUTOMATIC SUMMARIZATION FOR AMHARIC TEXT USING OPEN TEXT SUMMARIZER By ADDIS ASHAGRE TEKLEWOLD Name and signature of Members of Examining Board for Approval Name Title Signature Date Chairperson, Martha Yifiru (PhD) Advisor, Ermias Abebe Examiner, Declaration This thesis is my original work and has not been submitted for as a partial requirement for a degree in any university. Addis ashagre teklewold The thesis has been submitted for examination with my approval as university advisor Dr. Martha yifIru June, 2013 Dedication This thesis is dedicated to my loving family and friends. Thank you all for your unfailing support in every way I needed. Acknowledgement Thank you GOD for YOUR uplifting Sprit, love, grace and mercy in my life. I know I won‟t make it without YOU! I would like to thank my mum, W/ro Muluwork, sisters: Yodit, Tigist, Feven, Hanna, my one and only brother Yidnekachew, my cousin Mihret for your unfailing love, encouragement and support. Miss you dad! My nephew Nazri and niece Susu I love you so much! I would like to express my deepest gratitude to my advisor Dr. Martha Yifiru, for her commitment, support, constructive comments and guidance throughout this research. I would also like to thank Jenefir Costa and Susan P.D. for your financial support to sponsor my education. Thank you Barbara Patton, for your input and encouraging support to continue my study. I would like to thank those who participated in the manual summary preparation and evaluation process: Ashenafi W., Biruk F., Tigist A. and Solome G.. Also thank you Fikadu B., Girma D., Yeshinigus G. and Melese T. for your support and sharing your experiences. Last and definitely not least, all my friends really thank you. Where there is love, there is god and where there is god, there is life! Addis Ashagre June 2013 Table of Contents List of Tables .................................................................................................................................. v List of Figures ................................................................................................................................ vi List of Algorithms ......................................................................................................................... vii LIST OF ACRONYMS ............................................................................................................... viii ABSTRACT ................................................................................................................................... ix CHAPTER ONE ............................................................................................................................. 1 1.1. INTRODUCTION ............................................................................................................... 1 1.2. STATEMENT OF THE PROBLEM ................................................................................... 3 1.3. OBJECTIVES OF THE RESEARCH ................................................................................. 6 1.4. SIGNIFICANCE OF THE STUDY..................................................................................... 7 1.5. SCOPE AND LIMITATION OF THE RESEARCH .......................................................... 7 1.6. ORGANIZATION OF THE THESIS .................................................................................. 8 CHAPTER TWO ............................................................................................................................ 9 2. LITERATURE REVIEW ........................................................................................................ 9 2.1. TEXT SUMMARIZATION ................................................................................................ 9 2.2 AUTOMATIC TEXT SUMMARIZATION ..................................................................... 10 2.2.1 TYPES OF SUMMARIES ......................................................................................... 11 2.2.2. STAGES OF AUTOMATIC TEXT SUMMARIZATION ........................................ 14 2.2.3. AUTOMATIC TEXT SUMMARIZATION SELECTION CRITERIA .................... 17 2.2.4. TECHNIQUES OF AUTOMATIC TEXT SUMMARIZATION .............................. 22 2.2.5. EVALUATION TECHNIQUES ................................................................................ 25 2.3 GLOBAL RESEARCH ..................................................................................................... 31 2.4 AUTOMATIC TEXT SUMMARIZATION FOR LOCAL LANGUAGES ..................... 32 CHAPTER THREE ...................................................................................................................... 33 3. AMHARIC WRITING AND MORPHOLOGY ................................................................... 33 3.1. INTRODUCTION ............................................................................................................. 33 3.2. AMHARIC WRITING SYSTEM...................................................................................... 33 3.2.1 AMHARIC ALPHABETS ......................................................................................... 34 3.2.2 AMHARIC PUNCTUATION .................................................................................... 35 i 3.2.3 AMHARIC NUMBERS ............................................................................................. 35 3.3. AMHARIC WORD CLASSES ......................................................................................... 36 3.3.1 NOUN ......................................................................................................................... 37 3.3.2 VERB.......................................................................................................................... 37 3.3.3 ADVERB .................................................................................................................... 37 3.3.4 ADJECTIVE ............................................................................................................... 38 3.3.5 PREPOSITIONS ........................................................................................................ 38 3.3.6 CONJUNCTIONS ...................................................................................................... 39 3.3.7 INTERJECTIONS ...................................................................................................... 39 3.4. AMHARIC WORD FORMATION ................................................................................... 39 3.4.1 DERIVATIONAL MORPHOLOGY ......................................................................... 40 3.4.2 INFLECTIONAL MORPHOLOGY .......................................................................... 42 3.5. AMHARIC PHRASE CATEGORIES .............................................................................. 45 3.5.1. NOUN PHRASE (NP)................................................................................................ 46 3.5.2. VERB PHRASE (VP) ................................................................................................ 46 3.5.3. ADVERBIAL PHRASE (AdvP) ................................................................................ 46 3.5.4. ADJECTIVAL PHRASE (AdjP)................................................................................ 47 3.5.5. PREPOSITIONAL PHRASE (PP) ............................................................................. 47 3.6. AMHARIC PREFIXES ON VERBS IN CLAUSE FORMATION .................................. 47 3.7. AMHARIC SENTENCE STRUCTURE ........................................................................... 48 3.7.1. SIMPLE SENTENCES ................................................................................................... 48 3.7.2. COMPLEX SENTENCES .............................................................................................. 49 3.8. AMHARIC WRITING STRUCTURE .............................................................................. 49 CHAPTER FOUR ......................................................................................................................... 51 4. RESEARCH METHODOLOGY .......................................................................................... 51 4.1 LITERATURE REVIEW ............................................................................................... 51 4.2 AMHARIC LANGUAGE LEXICON GATHERING ................................................... 51 4.3 CORPUS PREPARATION ............................................................................................ 51 4.4 CUSTOMIZATION OF THE OTS................................................................................ 52 4.4.1 Requirements of the system ........................................................................................ 52 4.4.2 System Design ............................................................................................................ 52 ii 4.4.3 Implementation ........................................................................................................... 53 4.5 MANUAL SUMMARY PREPARATION .................................................................... 53 4.6 SUMMARIZATION TECHNIQUE AND TOOLS USED ........................................... 53 4.7 EVALUATION TECHNIQUE ...................................................................................... 54 CHAPTER FIVE .......................................................................................................................... 55 5. IMPLEMENTATION OF AMHARIC OPEN TEXT SUMMARIZER ............................... 55 5.1 INTRODUCTION ............................................................................................................. 55 5.2 DESCRIPTION OF OTS ................................................................................................... 56 5.2.1 What Is OTS? ............................................................................................................. 56 5.2.2 How OTS Works? ...................................................................................................... 57 5.2.3 Effectiveness of OTS .................................................................................................. 59 5.3 REQUIREMENTS OF THE SYSTEM ............................................................................. 60 5.3.1 List of Amharic Punctuation Marks ........................................................................... 60 5.3.2 List of Amharic Abbreviations ................................................................................... 60 5.3.3 List of Amharic Synonyms ......................................................................................... 61 5.3.4 List of Amharic Stop Words ....................................................................................... 62 5.3.5 Stemmer for Amharic ................................................................................................. 63 5.4 PREPROCESSING INVOLVED ...................................................................................... 66 5.4.1 Amharic News Article Gathering and Cleaning ......................................................... 66 5.4.2 Manual Summary Preparation .................................................................................... 67 5.4.3 Normalization of Amharic Characters ........................................................................ 68 5.4.4 Rules for the Stemmers Used ..................................................................................... 71 5.4.5 Customization of the OTS to Amharic News Summarization ................................... 75 5.4.6 Redesigning Graphic User Interface for AOTS ......................................................... 77 5.5 ARCHITECTURE OF THE SYSTEM ............................................................................. 78 5.6 DESCRIPTION OF THE SUMMARIZATION PROCESS ............................................. 79 CHAPTER SIX ............................................................................................................................. 82 6. EXPERIMENTATION AND EVALUATION ..................................................................... 82 6.1 INTRODUCTION ............................................................................................................. 82 6.2 AUTOMATIC SUMMARIZATION................................................................................. 84 6.2.1 E1 – Using OTS‟s Porter Stemmer............................................................................. 86 iii 6.2.2 E2 – After Introducing Amharic Stemmer ................................................................. 86 6.3 EVALUATION AND RESULTS DISCUSSION ............................................................. 87 6.3.1 Manual Evaluation for Experiment One ..................................................................... 90 6.3.2 Objective Evaluation for E1 ....................................................................................... 96 6.3.3 Objective Evaluation for E2 ....................................................................................... 99 6.3.4 Comparison of the two Experiments ........................................................................ 102 CHAPTER SEVEN .................................................................................................................... 104 7. CONCLUSION AND RECOMMENDATION .................................................................. 104 7.1 CONCLUSION ................................................................................................................ 104 7.2 RECOMMENDATION ................................................................................................... 107 BIBLIOGRAPHY ....................................................................................................................... 109 APPENDIX ................................................................................................................................. 114 I. List of Amharic characters, libiliazed characters and numbers ....................................... 114 II. List of Affixes .................................................................................................................. 116 III. Normalization List........................................................................................................ 117 IV. List of Common Amharic Abbreviations and their expanded forms ........................... 119 V. List of Some Common Amharic Synonyms .................................................................... 120 VI. List of stop words ......................................................................................................... 121 VII. Dictionary Rules – “am.xml” ....................................................................................... 123 VIII. Guidelines for Manual Summary Preparation .............................................................. 131 IX. Guidelines for Summary Subjective Evaluation .......................................................... 132 X. Grade Score Given Under Subjective Evaluation for 45 Selected Summaries ............... 134 XI. Manual Evaluation Results .......................................................................................... 136 XII. Objective Evaluation Results for Experiment One ...................................................... 139 XIII. Objective Evaluation Results for Experiment Two...................................................... 142 iv

Description:
A Thesis Submitted to the School of Graduate Studies of Addis Ababa. University in Partial achieved in other languages like English. synonyms, compound words and other rules. The second one . supreme court of Ethiopia.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.