ebook img

Introduction to Arabic Natural Language Processing. PDF

120 Pages·2005·1.14 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Introduction to Arabic Natural Language Processing.

D E T ACL’05 Tutorial A 5 D 0 P 0 2 University of Michigan - Ann Arbor U d T r 3 S June 25, 2005 y A l u L J Introduction to Arabic Natural Language Processing Nizar Habash Columbia University Center for Computational Learning Systems 1 • Focus of this tutorial – Phenomena – Concepts – Approaches & Resources • What is ‘Arabic’? – Arabic Script – Arabic Language • Modern Standard Arabic (MSA) • Arabic Dialects 2 Road Map • Introduction • Orthography • Morphology • Syntax • Machine Translation Issues • Dialects 3 Road Map • Introduction • Orthography – Arabic Script – MSA Phonology and Spelling – Recognizing Arabic vs. Persian/Urdu/Pashto/Kurdish/Sindhi/… – Encoding Issues • Morphology • Syntax • Machine Translation Issues • Dialects 4 Arabic Script 5 Arabic Script Arabic script is an alphabet with allographic variants, optional zero-width diacritics and common ligatures. ﻲ ﺑِﺮ ﻌ ﻟﺍ ﻂﹸ ﳋﹶ ﺍ Arabic script is used to write many languages: Arabic, Persian, Kurdish, Urdu, Pashto, etc. 6 Arabic Script Alphabet • letter forms • letter marks • Arabic only • Other languages • Persian, Kurdish, Urdu, Pashto, etc. OCR output ambiguity 7 • Arabic Script Alphabet (MSA) • letters (form+mark) ش س ث ت ب • Distinctive ʃ / / /s/ /θ/ /t/ /b/ ؤ ئ ء M إ أ ا • Non-distinctive /ʔ/ 8 glottal stop aka hamza Arabic Script Letter Shapes • No distinction between print and handwriting • No capitalization • Right-to-left Stand ن ب ك م ش غ • Ambiguous alone shapes ز د ا • Connective (cid:9) (cid:25) آ (cid:23) (cid:22) (cid:21) initial letters • Disconnective (cid:8) (cid:20) (cid:19) (cid:18) (cid:17) (cid:16) medial letters (cid:5) (cid:1) (cid:3) (cid:7) (cid:15) (cid:14) (cid:13) (cid:12) (cid:11) final 9 Arabic Script Letter shaping (cid:1) (cid:15)&آ = (cid:15)&آ ب ت ك /katab/ b t k to write (cid:1) ب(cid:3)&آ = ب(cid:3)&آ ب ا ت ك /kitāb/ b ā t k book 10

Description:
Center for Computational Learning Systems • Almost-free word order Language Resources and Tools in the Mediterranean
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.