Natural Language Systems: COMP34412 Allan Ramsay School of Computer Science, University of Manchester, Manchester M13 9PL, UK Contents How does language work? 11 Architecture of a NL system 21 COMP24412 23 COMP34412 24 Course outline (draft) 25 What makes different sounds different? 33 Chords 35 1 Sounds & Fourier Transforms 43 Fourier analysis 44 Speech recognition 49 Spectrogram (Fourier analysis): AAA vs OOO 50 Formants 51 Spectrogram (Fourier analysis): CAT vs PAT 52 Spectrogram (quantised): CAT vs PAT 54 Spectrogram (rate of change): CAT vs PAT 56 Mel Frequency Cepstral Coefficients 57 CHECKPOINT 72 Hidden Markov Models 73 Viterbi algorithm 80 Features, networks, network composition 101 What kinds of observations are we going to use? 102 CHECKPOINT 137 2 Speech synthesis 138 Using forced alignment to find diphones 144 Finding the best sequence of units 146 Pitch and duration 170 CHECKPOINT 177 The lexicon 182 Morphographemics, morphophonemics (‘spelling rules’) 188 Exercises for the reader 193 CHECKPOINT 195 Morphology 196 Categorial descriptions 202 CHECKPOINT 262 Backoff: stemming, tagging 263 Stemming 266 Tagging 270 3 Measuring accuracy 283 HMM-based tagging 286 Transformation-based learning (TBL) 304 Templates (see tbl.py for the full set) 306 Special cases 313 CHECKPOINT 318 Syntax & Parsing 319 Regular expressions as NL grammars 320 Regular expressions 324 Checkpoint 335 Deterministic dependency parsing (Nivre et al. 2007; Nivre 2003) 336 Learning from a treebank 358 Headed phrase structure trees dependency trees 359 ≡ Learning a set of parsing rules 367 Long-distance dependency revisited 380 4 Maximum spanning trees 386 ‘Edmond’s algorithm’ 389 CHECKPOINT 431 What can you do with it? 433 Better models 461 CHECKPOINT 463 Hand-coded lexical relations 465 WordNet 466 WordNet for pairwise entailment 470 WordNet for similarity measurements 475 CHECKPOINT 480 Textual entailment 481 Montague semantics 482 Bag of words 486 String-edit distance 488 5 Subsumption on parse trees 545 Domain rules 546 General principles 553 Where do you get rules from? 572 CHECKPOINT 574 6 Recommended reading: These notes, which you can get to at http://syllabus.cs.manchester.ac.uk/ugt/2017/COMP34412/COMP34412.pdf: I set my exams by trawling through my own notes, so everything I want you to know will be in here. You may find sets of notes in other places: don’t use them. In particular, don’t use last year’s notes, because the course was entirely different last year. And don’t read too far ahead, because they will evolve as we go. ‘Natural Language Processing with Python’, Bird, Klein & Loper: hard copy published by O’Reilly, but also available online with supporting software at http://www.nltk.org/. I will be using some of their software ‘SPEECH and LANGUAGE PROCESSING: An Introduction to Natural Language Process- ing, Computational Linguistics, and Speech Recognition’, Jurafsky & Martin: for anything that I don’t explain properly. Not actually the best book for semantics, but apart from that . . . Bits of code live in /opt/info/courses/COMP34412/PROGRAMS. Office hour: mail me, catch me after a lecture. Allan Ramsay, I INTRO -7- I INTRO I INTRO Allan Ramsay, I INTRO -8- I INTRO How do programs that let you talk to your computer work? How do programs that let you ask your computer questions whose answers are provided somewhere on the web work? How do programs that translate documents that are written in some foreign language work? Allan Ramsay, I INTRO -9- I INTRO To do any of these things, you have to know how language works And you have to be able to express your knowledge as a pro- gram But language is very complicated Allan Ramsay, I INTRO -10- I INTRO
Description: