ebook img

Speech and Language Processing [draft] PDF

1039 Pages·2008·41.21 MB·english
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Speech and Language Processing [draft]

Speech and Language Processing T F A R D AAII PRENTICEHALLSERIES INARTIFICIALINTELLIGENCE StuartRussellandPeterNorvig,Editors T FORSYTH & PONCE ComputerVision:AModernApproach GRAHAM ANSICommonLisp JURAFSKY &MARTIN SpeechandLanguageProcessing NEAPOLITAN LearningBayeFsianNetworks RUSSELL& NORVIG ArtificialIntelligence:AModernApproach A R D Speech and Language Processing An Introduction to Natural Language Processing, T Computational Linguistics, and Speech Recognition F Daniel Jurafsky and James H. Martin A R D Upper SaddleRiver,New Jersey07458 LibraryofCongressCataloging-in-PublicationData Jurafsky,DanielS.(DanielSaul) SpeechandLanguageProcessing/DanielJurafsky,JamesH.Martin. p. cm. Includesbibliographicalreferencesandindex. ISBN0-13-095069-6FIXTHIS Editor-in-Chief: FIXTHISSTUFF T Publisher:TracyDunkelberger Editorial/productionsupervision:ScottDisanno Editorialassistant: Executivemanagingeditor: Coverdesigndirector: Coverdesignexecution: Manufacturingmanager: F Manufacturingbuyer: Assistantvice-presidentofproductionandmanufacturing: Coverdesign: DanielJurafsky,JamesH.Martin,andLindaMartin. FIXTHISThefrontcoverdrawing istheactionfortheJacquardLoom(Usher,1954). ThebackcoverdrawingisAlexanderGrahamBell’s Gallowstelephone(Rhodes,1929). A ThisbookwassetinTimes-RomanandTIPA(IPA)bytheauthorsusingLATEX2e . (cid:13)c 2008byPrentice-Hall,Inc. PearsonHigherEducation UpperSaddleRiver,NewJersey07458 Theauthorandpublisherofthisbookhaveusedtheirbesteffortsinpreparingthisbook.Theseefforts R includethedevelopment,research,andtestingofthetheoriesandprogramstodeterminetheireffectiveness. Theauthorandpublishershallnotbeliableinanyeventforincidentalorconsequentialdamagesin connectionwith,orarisingoutof,thefurnishing,performance,oruseoftheseprograms. Allrightsreserved.Nopartofthisbookmaybereproduced,inanyformorbyanymeans,withoutpermission inwritingfromthepublisher. D PrintedintheUnitedStatesofAmerica 10 9 8 7 6 5 4 3 2 1 ISBN 0-13-095069-6 FIXTHISTOO Prentice-HallInternational(UK)Limited,London Prentice-HallofAustraliaPty.Limited,Sydney Prentice-HallCanada,Inc.,Toronto Prentice-HallHispanoamericana,S.A.,Mexico Prentice-HallofIndiaPrivateLimited,NewDelhi Prentice-HallofJapan,Inc.,Tokyo Prentice-HallAsiaPte.Ltd.,Singapore EditoraPrentice-HalldoBrasil,Ltda.,RiodeJaneiro T F A For —D.J. For — J.M. R D T F A R D Summary of Contents Preface.............................................................xxiii 1 Introduction..................................................... 1 T I Words 2 RegularExpressionsandAutomata............................... 17 3 Words&Transducers............................................ 45 4 N-grams......................................................... 83 5 Part-of-SpeechTagging .......................................... 123 6 HiddenMarkovandMaximumEntrFopyModels...................173 II Speech 7 Phonetics........................................................215 8 SpeechSynthesis.................................................249 9 AutomaticSpeechRecognition....................................287 A 10 SpeechRecognition:AdvancedTopics............................ 337 11 ComputationalPhonology........................................365 III Syntax 12 FormalGrammarsofEnglish.....................................389 13 ParsingwithContext-FreeGrammars............................ 431 14 StatisticalParsing................................................465 15 FeatuRresandUnification......................................... 495 16 LanguageandComplexity........................................537 IV Semantics andPragmatics 17 RepresentingMeaning........................................... 553 18 ComputationalSemantics........................................ 593 19 LexicalSemantics................................................627 D 20 ComputationalLexicalSemantics.................................653 21 ComputationalDiscourse.........................................697 V Applications 22 InformationExtraction...........................................741 23 QuestionAnsweringandSummarization..........................783 24 DialogueandConversationalAgents..............................829 25 MachineTranslation.............................................879 Bibliography 929 Index 981 vii T F A R D Contents Preface xxiii T 1 Introduction 1 1.1 KnowledgeinSpeechandLanguageProcessing . . . . . . . . . . . 2 1.2 Ambiguity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 ModelsandAlgorithms . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 Language,Thought,andUnderstandinFg. . . . . . . . . . . . . . . . 6 1.5 TheStateoftheArt . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.6 SomeBriefHistory . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.6.1 FoundationalInsights:1940sand1950s . . . . . . . . . . . 9 1.6.2 TheTwoCamps:1957–1970. . . . . . . . . . . . . . . . . 10 1.6.3 FourParadigmAs:1970–1983 . . . . . . . . . . . . . . . . . 11 1.6.4 EmpiricismandFiniteStateModelsRedux:1983–1993 . . 12 1.6.5 TheFieldComesTogether:1994–1999 . . . . . . . . . . . 12 1.6.6 TheRiseofMachineLearning:2000–2007 . . . . . . . . . 13 1.6.7 OnMultipleDiscoveries . . . . . . . . . . . . . . . . . . . 13 1.6.8 AFinalBriefNoteonPsychology . . . . . . . . . . . . . . 14 1.7 SumRmary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 BibliographicalandHistoricalNotes . . . . . . . . . . . . . . . . . . . . . 15 I Words 2 RegularExpressionsandAutomata 17 D2.1 RegularExpressions . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1.1 BasicRegularExpressionPatterns . . . . . . . . . . . . . . 18 2.1.2 Disjunction,Grouping,andPrecedence . . . . . . . . . . . 21 2.1.3 ASimpleExample . . . . . . . . . . . . . . . . . . . . . . 22 2.1.4 AMoreComplexExample . . . . . . . . . . . . . . . . . . 23 2.1.5 AdvancedOperators . . . . . . . . . . . . . . . . . . . . . 24 2.1.6 RegularExpressionSubstitution,Memory,andELIZA . . . 25 2.2 Finite-StateAutomata . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.2.1 UsinganFSAtoRecognizeSheeptalk . . . . . . . . . . . . 26 2.2.2 FormalLanguages . . . . . . . . . . . . . . . . . . . . . . 30 2.2.3 AnotherExample . . . . . . . . . . . . . . . . . . . . . . . 31 2.2.4 Non-DeterministicFSAs . . . . . . . . . . . . . . . . . . . 32 2.2.5 UsinganNFSAtoAcceptStrings . . . . . . . . . . . . . . 33 2.2.6 RecognitionasSearch . . . . . . . . . . . . . . . . . . . . 35 2.2.7 RelatingDeterministicandNon-DeterministicAutomata . . 38 2.3 RegularLanguagesandFSAs . . . . . . . . . . . . . . . . . . . . . 38 2.4 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 BibliographicalandHistoricalNotes . . . . . . . . . . . . . . . . . . . . . 42 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 ix x Contents 3 Words&Transducers 45 3.1 Surveyof(Mostly)EnglishMorphology . . . . . . . . . . . . . . . 47 3.1.1 InflectionalMorphology . . . . . . . . . . . . . . . . . . . 48 3.1.2 DerivationalMorphology. . . . . . . . . . . . . . . . . . . 50 3.1.3 Cliticization. . . . . . . . . . . . . . . . . .T. . . . . . . . 51 3.1.4 Non-concatenativeMorphology . . . . . . . . . . . . . . . 52 3.1.5 Agreement . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.2 Finite-StateMorphologicalParsing . . . . . . . . . . . . . . . . . . 53 3.3 BuildingaFinite-StateLexicon . . . . . . . . . . . . . . . . . . . . 54 3.4 Finite-StateTransducers . . . . . . . . . . . . . . . . . . . . . . . . 57 F 3.4.1 SequentialTransducersandDeterminism . . . . . . . . . . 59 3.5 FSTsforMorphologicalParsing . . . . . . . . . . . . . . . . . . . 60 3.6 TransducersandOrthographicRules . . . . . . . . . . . . . . . . . 63 3.7 CombiningFSTLexiconandRules . . . . . . . . . . . . . . . . . . 65 3.8 Lexicon-FreeFSTs: ThAePorterStemmer . . . . . . . . . . . . . . . 68 3.9 WordandSentenceTokenization . . . . . . . . . . . . . . . . . . . 69 3.9.1 SegmentationinChinese . . . . . . . . . . . . . . . . . . . 70 3.10 DetectingandCorrectingSpellingErrors . . . . . . . . . . . . . . . 72 3.11 MinimumEditDistance . . . . . . . . . . . . . . . . . . . . . . . . 74 3.12 HumanMorphologicalProcessing . . . . . . . . . . . . . . . . . . 77 3.13 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 R BibliographicalandHistoricalNotes . . . . . . . . . . . . . . . . . . . . . 80 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4 N-grams 83 4.1 CountingWordsinCorpora . . . . . . . . . . . . . . . . . . . . . . 84 4.2 Simple(Unsmoothed)N-grams . . . . . . . . . . . . . . . . . . . . 86 D 4.3 TrainingandTestSets . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.3.1 N-gramSensitivitytotheTrainingCorpus. . . . . . . . . . 92 4.3.2 UnknownWords:Openversusclosedvocabularytasks . . . 94 4.4 EvaluatingN-grams:Perplexity . . . . . . . . . . . . . . . . . . . . 95 4.5 Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.5.1 LaplaceSmoothing . . . . . . . . . . . . . . . . . . . . . . 98 4.5.2 Good-TuringDiscounting . . . . . . . . . . . . . . . . . . 101 4.5.3 SomeadvancedissuesinGood-Turingestimation . . . . . . 102 4.6 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.7 Backoff. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 4.7.1 Advanced:DetailsofcomputingKatzbackoffa andP . . 106 ∗ 4.8 PracticalIssues: ToolkitsandDataFormats. . . . . . . . . . . . . . 107 4.9 AdvancedIssuesinLanguageModeling . . . . . . . . . . . . . . . 109 4.9.1 AdvancedSmoothingMethods:Kneser-NeySmoothing . . 109 4.9.2 Class-basedN-grams . . . . . . . . . . . . . . . . . . . . . 111 4.9.3 LanguageModelAdaptationandUsingtheWeb . . . . . . 111 4.9.4 UsingLongerDistanceInformation:ABriefSummary . . . 112 4.10 Advanced:InformationTheoryBackground . . . . . . . . . . . . . 113 4.10.1 Cross-EntropyforComparingModels . . . . . . . . . . . . 116

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.