Table Of ContentSpeech and Language Processing
AAII PRENTICEHALLSERIES
INARTIFICIALINTELLIGENCE
StuartRussellandPeterNorvig,Editors
GRAHAM ANSI CommonLisp
MUGGLETON LogicalFoundations ofMachineLearning
RUSSELL & NORVIG ArtificialIntelligence: AModernApproach
JURAFSKY & MARTIN SpeechandLanguageProcessing
Speech and Language Processing
An Introduction to Natural Language Processing, Computational Linguistics
and Speech Recognition
Daniel Jurafsky and James H. Martin
Draft of September 28, 1999. Do not cite without permission.
Contributingwriters:
AndrewKehler, KeithVanderLinden, NigelWard
PrenticeHall,EnglewoodCliffs,NewJersey07632
LibraryofCongressCataloging-in-PublicationData
Jurafsky,DanielS.(DanielSaul)
SpeechandLangaugeProcessing/DanielJurafsky,JamesH.Martin.
p. cm.
Includesbibliographicalreferencesandindex.
ISBN
Publisher:AlanApt
c 2000byPrentice-Hall,Inc.
(cid:13)
ASimon&SchusterCompany
EnglewoodCliffs,NewJersey07632
Theauthorandpublisherofthisbookhaveusedtheirbesteffortsinpreparingthis
book.Theseeffortsincludethedevelopment,research,andtestingofthetheories
andprogramstodeterminetheireffectiveness.Theauthorandpublishershallnot
beliableinanyeventforincidentalorconsequentialdamagesinconnectionwith,
orarisingoutof,thefurnishing,performance,oruseoftheseprograms.
Allrightsreserved.Nopartofthisbookmaybe
reproduced,inanyformorbyanymeans,
withoutpermissioninwritingfromthepublisher.
PrintedintheUnitedStatesofAmerica
10 9 8 7 6 5 4 3 2 1
Prentice-HallInternational(UK)Limited,London
Prentice-HallofAustraliaPty.Limited,Sydney
Prentice-HallCanada,Inc.,Toronto
Prentice-HallHispanoamericana,S.A.,Mexico
Prentice-HallofIndiaPrivateLimited,NewDelhi
Prentice-HallofJapan,Inc.,Tokyo
Simon&SchusterAsiaPte.Ltd.,Singapore
EditoraPrentice-HalldoBrasil,Ltda.,RiodeJaneiro
Formyparents— D.J.
ForLinda —J.M.
Summary of Contents
1 Introduction............................................ 1
I Words 19
2 RegularExpressionsandAutomata...................... 21
3 MorphologyandFinite-StateTransducers............... 57
4 ComputationalPhonologyandText-to-Speech........... 91
5 ProbabilisticModelsofPronunciationandSpelling ...... 139
6 N-grams ............................................... 189
7 HMMsandSpeechRecognition......................... 233
II Syntax 283
8 WordClassesandPart-of-SpeechTagging............... 285
9 Context-FreeGrammarsforEnglish .................... 319
10 ParsingwithContext-FreeGrammars...................353
11 FeaturesandUnification................................391
12 Lexicalized andProbabilisticParsing....................443
13 LanguageandComplexity.............................. 473
III Semantics 495
14 RepresentingMeaning..................................497
15 SemanticAnalysis...................................... 543
16 LexicalSemantics ...................................... 587
17 WordSenseDisambiguationandInformationRetrieval .. 627
IV Pragmatics 661
18 Discourse .............................................. 663
19 DialogueandConversational Agents.....................715
20 Generation.............................................759
21 MachineTranslation....................................797
A RegularExpression Operators.......................... 829
B ThePorterStemmingAlgorithm........................ 831
C C5andC7tagsets ...................................... 835
D TrainingHMMs: TheForward-BackwardAlgorithm.... 841
Bibliography 851
Index 923
vii
Contents
1 Introduction 1
1.1 KnowledgeinSpeechandLanguageProcessing . . . . . . 2
1.2 Ambiguity . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 ModelsandAlgorithms . . . . . . . . . . . . . . . . . . . 5
1.4 Language, Thought,andUnderstanding . . . . . . . . . . . 6
1.5 TheStateoftheArtandTheNear-TermFuture . . . . . . . 9
1.6 SomeBriefHistory . . . . . . . . . . . . . . . . . . . . . 10
Foundational Insights: 1940’s and1950’s . . . . . . . . . . 10
TheTwoCamps: 1957–1970 . . . . . . . . . . . . . . . . 11
FourParadigms: 1970–1983 . . . . . . . . . . . . . . . . . 13
EmpiricismandFiniteStateModelsRedux: 1983-1993 . . 14
TheFieldComesTogether: 1994-1999 . . . . . . . . . . . 14
AFinalBriefNoteonPsychology . . . . . . . . . . . . . . 15
1.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Bibliographical andHistoricalNotes . . . . . . . . . . . . . . . . 16
I Words 19
2 RegularExpressionsandAutomata 21
2.1 RegularExpressions . . . . . . . . . . . . . . . . . . . . . 22
BasicRegularExpression Patterns . . . . . . . . . . . . . 23
Disjunction, Grouping, andPrecedence . . . . . . . . . . . 27
Asimpleexample . . . . . . . . . . . . . . . . . . . . . . 28
AMoreComplexExample . . . . . . . . . . . . . . . . . 29
AdvancedOperators . . . . . . . . . . . . . . . . . . . . . 30
RegularExpression Substitution, Memory,andELIZA . . . 31
2.2 Finite-StateAutomata . . . . . . . . . . . . . . . . . . . . 33
UsinganFSAtoRecognize Sheeptalk . . . . . . . . . . . 34
FormalLanguages . . . . . . . . . . . . . . . . . . . . . . 38
AnotherExample . . . . . . . . . . . . . . . . . . . . . . 39
Nondeterministic FSAs . . . . . . . . . . . . . . . . . . . 40
UsinganNFSAtoacceptstrings . . . . . . . . . . . . . . 42
Recognition asSearch . . . . . . . . . . . . . . . . . . . . 44
RelatingDeterministicandNon-deterministic Automata . . 48
2.3 RegularLanguages andFSAs . . . . . . . . . . . . . . . . 49
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
ix
x Contents
Bibliographical andHistoricalNotes . . . . . . . . . . . . . . . . 52
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3 MorphologyandFinite-StateTransducers 57
3.1 Surveyof(Mostly)EnglishMorphology . . . . . . . . . . 59
Inflectional Morphology . . . . . . . . . . . . . . . . . . . 61
Derivational Morphology . . . . . . . . . . . . . . . . . . 63
3.2 Finite-StateMorphological Parsing . . . . . . . . . . . . . 65
TheLexiconandMorphotactics . . . . . . . . . . . . . . . 66
Morphological ParsingwithFinite-StateTransducers . . . 71
Orthographic RulesandFinite-StateTransducers . . . . . . 76
3.3 CombiningFSTLexiconandRules . . . . . . . . . . . . . 79
3.4 Lexicon-free FSTs: ThePorterStemmer . . . . . . . . . . 82
3.5 HumanMorphological Processing . . . . . . . . . . . . . 84
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Bibliographical andHistoricalNotes . . . . . . . . . . . . . . . . 87
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4 ComputationalPhonologyandText-to-Speech 91
4.1 SpeechSoundsandPhoneticTranscription . . . . . . . . . 92
TheVocalOrgans . . . . . . . . . . . . . . . . . . . . . . 94
Consonants: PlaceofArticulation . . . . . . . . . . . . . . 97
Consonants: MannerofArticulation . . . . . . . . . . . . 98
Vowels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.2 ThePhonemeandPhonological Rules . . . . . . . . . . . 102
4.3 Phonological RulesandTransducers . . . . . . . . . . . . 104
4.4 AdvancedIssuesinComputational Phonology . . . . . . . 109
Harmony . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
TemplaticMorphology . . . . . . . . . . . . . . . . . . . 111
OptimalityTheory . . . . . . . . . . . . . . . . . . . . . . 112
4.5 MachineLearningofPhonological Rules . . . . . . . . . . 117
4.6 MappingTexttoPhonesforTTS . . . . . . . . . . . . . . 119
Pronunciation dictionaries . . . . . . . . . . . . . . . . . . 119
BeyondDictionary Lookup: TextAnalysis . . . . . . . . . 121
AnFST-basedpronunciation lexicon . . . . . . . . . . . . 124
4.7 ProsodyinTTS . . . . . . . . . . . . . . . . . . . . . . . 129
Phonological AspectsofProsody . . . . . . . . . . . . . . 129
PhoneticorAcousticAspectsofProsody . . . . . . . . . . 131
ProsodyinSpeechSynthesis . . . . . . . . . . . . . . . . 131