ebook img

Foundations of Computational Linguistics: Human-Computer Communication in Natural Language PDF

518 Pages·2014·8.05 MB·english
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Foundations of Computational Linguistics: Human-Computer Communication in Natural Language

Roland Hausser Foundations of Computational Linguistics Human-Computer Communication in Natural Language Third Edition Foundations of Computational Linguistics Roland Hausser Foundations of Computational Linguistics Human-Computer Communication in Natural Language Third Edition RolandHausser AbteilungfürComputerlinguistik(retired) Friedrich-Alexander-Universität Erlangen-Nürnberg Erlangen,Germany ISBN978-3-642-41430-5 ISBN978-3-642-41431-2(eBook) DOI10.1007/978-3-642-41431-2 SpringerHeidelbergNewYorkDordrechtLondon LibraryofCongressControlNumber:2013956746 ©Springer-VerlagBerlinHeidelberg1999,2001,2014 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartofthema- terialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation,broadcasting, reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformationstorageandretrieval,elec- tronicadaptation,computersoftware,orbysimilarordissimilarmethodologynowknownorhereafterdeveloped. Exemptedfromthislegalreservationarebriefexcerptsinconnectionwithreviewsorscholarlyanalysisormaterial suppliedspecificallyforthepurposeofbeingenteredandexecutedonacomputersystem,forexclusiveusebythe purchaserofthework.Duplicationofthispublicationorpartsthereofispermittedonlyundertheprovisionsofthe CopyrightLawofthePublisher’slocation,initscurrentversion,andpermissionforusemustalwaysbeobtainedfrom Springer.PermissionsforusemaybeobtainedthroughRightsLinkattheCopyrightClearanceCenter.Violationsare liabletoprosecutionundertherespectiveCopyrightLaw. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublicationdoesnot imply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevantprotectivelawsand regulationsandthereforefreeforgeneraluse. Whiletheadviceandinformationinthisbookarebelievedtobetrueandaccurateatthedateofpublication,neither theauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityforanyerrorsoromissionsthatmay bemade.Thepublishermakesnowarranty,expressorimplied,withrespecttothematerialcontainedherein. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Preface to the Third Edition The Third Edition has been gently modernized, leaving the overview of traditional, theoretical,andcomputationallinguistics,analyticphilosophyoflanguage,andmath- ematicalcomplexitytheorywiththeirhistoricalbackgroundsintact.Theformatofthe empiricalanalysesofEnglishandGermansyntaxandsemanticshasbeenadaptedto current practice. Sects. 22.3–24.5 have been rewritten to focus more sharply on the constructionofatalkingrobot. BecausetheSecondEditionisquitelong,oneaimofthepresenteditionhasbeento shorten the number of pages. Little is lost, however, because later publications make upforareductionofabout10percent.Thebroad,leisurelystyleinducedbythebook’s origin as material for a four semester introductory lecture course, however, has been preserved. Special thanks to Ronan Nugent, Senior Editor of Computer Science at Springer, forhiscontinuedsupport. Erlangen,Germany RolandHausser June2013 v Remark on the Use of Risci Science books often use a menagerie of tables, graphs, examples, definitions, theo- rems,lemmata,schemata,etc.,whichareeither(i)notnumberedor(ii)usedifferent countersforeachkind.Fortheauthor,(i)makesitdifficulttorefertoaparticularitem. Forthereader,(ii)makesitdifficulttofindtheitem. As a remedy, we use a uniform container called riscus1 to hold any of such items. A riscus begins with a numbered head line and ends when normal text resumes. In thisbook,thenumberofariscusconsistsofthreeparts,e.g.,3.2.1,whichspecifythe locationintermsofthechapterwithinthetext,thesectionwithinthechapter,andthe riscuswithinthesection. Independently of the riscus numbering, the head line may classify and number the itemcontained,asinthefollowingfictitiousexample: 3.2.1 THEOREM 9: THE POLYNOMIAL COMPLEXITY OF CF LANGUAGES Contentofthetheoremfollowedbytheproof. Elsewhere in the text, the above riscus may be referred to with something like As shown in 3.2.1, ... or In Theorem 9 (3.2.1) ... . In this way, the riscus numbering allows the author to be brief yet precise because the reader may be guided to a more detailedpresentationgiveninanotherplace.Themethodworksregardlessofwhether thebookisahardcopyoraneBook. The numbering of a riscus may be provided automatically, for example, by using the LATEX \subsection command for riscus headings. The riscus numbering, like that of chapters and sections, may be hooked up with automatic cross-referencing, for example,byadding\usepackage{hyperref}totheLATEXpreambleofthetext. As a result, the reader of an eBook may click on a three part number to be trans- portedautomaticallytotheriscusinquestion,andmaygetbacktotheearlierposition withasimplecommandlikecntrl[.Thecomfortofaccessingariscusdirectly,instead ofhavingtoleafthroughthechapterandthesectionasinahardcopy,isperhapsnot theleastadvantageaddedbythetransitionfromahardcopytoaneBook. Erlangen,Germany RolandHausser 1riscus<i>m,istheLatinwordforsuitcase.Ariscuswasmadeofwovenwillowbranches coveredwithfur. vii Preface to the First Edition The central task of a future-oriented computational linguistics is the development of cognitivemachineswhichhumanscanfreelytalkwithintheirrespectivenaturallan- guage.Inthelongrun,thistaskwillensurethedevelopmentofafunctionaltheoryof language,anobjectivemethodofverification,andawiderangeofapplications. Natural communication requires not only verbal processing, but also non-verbal recognitionandaction.Thereforethecontentofthistextbookisorganizedasatheory oflanguagefortheconstructionoftalkingrobots.Themaintopicisthemechanismof naturallanguagecommunicationinboththespeakerandthehearer. Thecontentisdividedintothefollowingparts: I. TheoryofLanguage II. TheoryofGrammar III. MorphologyandSyntax IV. SemanticsandPragmatics Eachpartconsistsof6chapters.Eachofthe24chaptersconsistsof5sections.Atotal of797exerciseshelpinreviewingkeyideasandimportantproblems. Part I begins with current applications of computational linguistics. Then it de- scribesanewtheoryoflanguage,thefunctioningofwhichisillustrated bytherobot CURIOUS.ThistheoryisreferredtowiththeacronymSLIM,whichstandsforSurface compositionalLinearInternalMatching.Itincludesacognitivefoundationofseman- tic primitives, a theory of signs, a structural delineation of the components syntax, semantics, and pragmatics, as well as their functional integration in the speaker’s ut- teranceandthehearer’sinterpretation.Thepresentationreferstoothercontemporary theoriesoflanguage,especiallythoseofChomskyandGrice,aswellastotheclassic theories of Frege, Peirce, de Saussure, Bühler, and Shannon and Weaver, explaining their formal and methodological foundations as well as their historical background andmotivations. Part II presents the theory of formal grammar and its methodological, mathemat- ical, and computational roles in the description of natural languages. A description of Categorial (C) Grammar and Phrase Structure (PS) Grammar is combined with an introduction to the basic notions and linguistic motivation of generative gram- mar. Further topics are the declarative vs. procedural aspects of parsing and gener- ix x PrefacetotheFirstEdition ation, type transparency, as well as the relation between formalisms and complexity classes. It is shown that the principle of possible substitutions causes empirical and mathematicalproblemsforthedescriptionofnaturallanguage.Asanalternative,the principle of possible continuations is formalized as LA Grammar. LA stands for the left-associativederivationorderwhichmodelsthetime-linearnatureoflanguage.Ap- plications of LA Grammar to relevant artificial languages show that its hierarchy of formallanguagesisorthogonaltothatofPSGrammar.WithintheLAhierarchy,natu- rallanguageisinthelowestcomplexityclass,namelytheclassofC1languageswhich parseinlineartime. PartIIIdescribesthemorphologyandsyntaxofnaturallanguage.Ageneraldescrip- tion of the notions word, word form, morpheme, and allomorph, the morphological processes of inflection, derivation, and composition, as well as the different possi- ble methods of automatic word form recognition is followed by the morphological analysisofEnglishwithintheframeworkofLAGrammar.Thenthesyntacticprinci- ples of valency, agreement, and word order are explained within the left-associative approach. LA Grammars for English and German are developed by systematically extending a small initial system to handle more and more constructions such as the fixed vs. free word order of English and German, respectively, the structure of com- plex noun phrasesandcomplex verbs,interrogatives, subordinate clauses,etc. These analysesarepresentedintheformofexplicitgrammarsandsamplederivations. Part IV describes the semantics and pragmatics of natural language. The general description of language interpretation begins by comparing three different types of semantics, namely those of logical languages, programming languages, and natural languages. Based on Tarski’s foundation of logical semantics and his reconstruction of the Epimenides paradox, the possibility of applying logical semantics to natural language is investigated. Alternative analyses of intensional contexts, propositional attitudes, and the phenomenon of vagueness illustrate that different types of seman- tics are based on different ontologies which greatly influence the empirical results. It is shown how a semantic interpretation may cause an increase in complexity and how this is to be avoided within the SLIM theory of language. The last two chap- ters,23and24,analyzetheinterpretation bythehearerandtheconceptualization by thespeakerasatime-linearnavigation throughadatabasecalled wordbank.Aword bank allows the storage of arbitrary propositional content and is implemented as a highlyrestrictedspecializationofaclassic(i.e.,record-based)networkdatabase.The autonomousnavigationthroughawordbankiscontrolledbytheexplicitrulesofsuit- ableLAGrammars. As supplementary reading the Survey of the State of the Art in Human Language Technology,RonCole(ed.)1998isrecommended.Thisbookcontainsabout90con- tributions by different specialists giving detailed snapshots of their research in lan- guagetheoryandtechnology. Erlangen,Germany RolandHausser June1999 Contents PartI. TheoryofLanguage 1. ComputationalAnalysisofNaturalLanguage . . . . . . . . . . . . . 3 1.1 Human-ComputerCommunication . . . . . . . . . . . . . . . . . . 4 1.2 LanguageSciencesandTheirComponents. . . . . . . . . . . . . . 7 1.3 MethodsandApplicationsofComputationalLinguistics . . . . . . 12 1.4 ElectronicMediuminRecognitionandSynthesis . . . . . . . . . . 13 1.5 SecondGutenbergRevolution . . . . . . . . . . . . . . . . . . . . 16 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2. Smartvs.SolidSolutions . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.1 IndexingandRetrievalinTextualDatabases . . . . . . . . . . . . . 25 2.2 UsingGrammaticalKnowledge . . . . . . . . . . . . . . . . . . . 29 2.3 Smartvs.SolidSolutionsinComputationalLinguistics . . . . . . . 31 2.4 BeginningsofMachineTranslation . . . . . . . . . . . . . . . . . 33 2.5 MachineTranslationToday . . . . . . . . . . . . . . . . . . . . . . 37 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3. CognitiveFoundationsofSemantics . . . . . . . . . . . . . . . . . . . 45 3.1 PrototypeofCommunication . . . . . . . . . . . . . . . . . . . . . 45 3.2 FromPerceptiontoRecognition . . . . . . . . . . . . . . . . . . . 47 3.3 IconicityofFormalConcepts . . . . . . . . . . . . . . . . . . . . . 50 3.4 ContextPropositions . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.5 RecognitionandAction . . . . . . . . . . . . . . . . . . . . . . . . 59 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4. LanguageCommunication . . . . . . . . . . . . . . . . . . . . . . . . 65 4.1 AddingLanguage . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.2 ModelingReference . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.3 UsingLiteralMeaning . . . . . . . . . . . . . . . . . . . . . . . . 71 4.4 Frege’sPrinciple . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.5 SurfaceCompositionality . . . . . . . . . . . . . . . . . . . . . . . 76 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 xi xii Contents 5. UsingLanguageSignsonSuitableContexts . . . . . . . . . . . . . . 87 5.1 Bühler’sOrganonModel . . . . . . . . . . . . . . . . . . . . . . . 87 5.2 PragmaticsofToolsandPragmaticsofWords . . . . . . . . . . . . 89 5.3 FindingtheCorrectSubcontext . . . . . . . . . . . . . . . . . . . . 91 5.4 LanguageProductionandInterpretation . . . . . . . . . . . . . . . 94 5.5 ThoughtastheMotorofSpontaneousProduction . . . . . . . . . . 97 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 6. StructureandFunctioningofSigns . . . . . . . . . . . . . . . . . . . 103 6.1 ReferenceMechanismsofDifferentSignKinds . . . . . . . . . . . 103 6.2 InternalStructureofSymbolsandIndexicals . . . . . . . . . . . . 107 6.3 RepeatingReference . . . . . . . . . . . . . . . . . . . . . . . . . 110 6.4 ExceptionalPropertiesofIconandName . . . . . . . . . . . . . . 114 6.5 Pictures,Pictograms,andLetters . . . . . . . . . . . . . . . . . . . 118 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 PartII. TheoryofGrammar 7. FormalGrammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 7.1 LanguageasaSubsetoftheFreeMonoid . . . . . . . . . . . . . . 127 7.2 MethodologicalReasonsforFormalGrammar . . . . . . . . . . . . 132 7.3 AdequacyofFormalGrammars . . . . . . . . . . . . . . . . . . . 133 7.4 FormalismofCGrammar . . . . . . . . . . . . . . . . . . . . . . 134 7.5 CGrammarforNaturalLanguage . . . . . . . . . . . . . . . . . . 138 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 8. LanguageHierarchiesandComplexity . . . . . . . . . . . . . . . . . 143 8.1 FormalismofPSGrammar . . . . . . . . . . . . . . . . . . . . . . 144 8.2 LanguageClassesandComputationalComplexity . . . . . . . . . . 146 8.3 GenerativeCapacityandFormalLanguageClasses . . . . . . . . . 149 8.4 PSGrammarforNaturalLanguage . . . . . . . . . . . . . . . . . . 154 8.5 ConstituentStructureParadox . . . . . . . . . . . . . . . . . . . . 159 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 9. BasicNotionsofParsing . . . . . . . . . . . . . . . . . . . . . . . . . 167 9.1 DeclarativeandProceduralAspectsofParsing . . . . . . . . . . . . 167 9.2 FittingGrammarontoLanguage . . . . . . . . . . . . . . . . . . . 169 9.3 TypeTransparencyBetweenGrammarandParser . . . . . . . . . . 174 9.4 Input-OutputEquivalencewiththeSpeaker-Hearer . . . . . . . . . 180 9.5 DesiderataofGrammarforAchievingConvergence . . . . . . . . . 183 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 10. Left-AssociativeGrammar(LAG) . . . . . . . . . . . . . . . . . . . . 189 10.1 RuleKindsandDerivationOrder . . . . . . . . . . . . . . . . . . . 189 10.2 FormalismofLAGrammar . . . . . . . . . . . . . . . . . . . . . . 193

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.