ATLANTISTHINKINGMACHINES VOLUME3 SERIESEDITOR: KAI-UWEKU¨HNBERGER Atlantis Thinking Machines SeriesEditor: Kai-UweKu¨hnberger InstituteofCognitiveScience UniversityofOsnabru¨ck,Germany (ISSN:1877-3273) Aimsandscopeoftheseries This series publishes books resulting from theoretical research on and reproductions of general Artificial Intelligence (AI). The book series focuses on the establishment of new theoriesandparadigmsinAI.Atthesametime,theseriesaimsatexploringmultiplescien- tificanglesandmethodologies,includingresultsfromresearchincognitivescience,neu- roscience, theoreticalandexperimentalAI,biologyandfrominnovativeinterdisciplinary methodologies. Allbooksinthisseriesareco-publishedwithSpringer. Formoreinformationonthisseriesandourotherbookseries,pleasevisitourwebsiteat: www.atlantis-press.com/publications/books AMSTERDAM–PARIS–BEIJING (cid:2)c ATLANTISPRESS Integration of World Knowledge for Natural Language Understanding Ekaterina Ovchinnikova USCISI 4676AdmiraltyWay MarinadelRey,CA90292 USA AMSTERDAM–PARIS–BEIJING AtlantisPress 8,squaredesBouleaux 75019Paris,France ForinformationonallAtlantisPresspublications,visitourwebsiteat:www.atlantis-press.com Copyright Thisbook,oranypartsthereof,maynotbereproducedforcommercialpurposesinanyformorby anymeans,electronicormechanical,includingphotocopying,recordingoranyinformationstorage andretrievalsystemknownortobeinvented,withoutpriorpermissionfromthePublisher. AtlantisThinkingMachines Volume1:Enaction,Embodiment,EvolutionaryRobotics.SimulationModelsforaPost-Cognitivist ScienceofMind-MariekeRohde,EzequielA.DiPaolo Volume2:Real-WorldReasoning:TowardScalable,UncertainSpatiotemporal,Contextualand CausalInference-BenGoertzel,NilGeisweiller,Lu´cioCoelho,PredragJanicˇic´,CassioPennachin ISBNs Print: 978-94-91216-52-7 E-Book: 978-94-91216-53-4 ISSN: 1877-3273 (cid:2)c 2012ATLANTISPRESS Foreword Inference-basednaturallanguageunderstanding(NLU)wasathrivingareaofresearchin the 1970s and 1980s. It resulted in good theoretical work and in interesting small-scale systems.Butintheearly1990sitfounderedonthreedifficulties: • Parserswerenotaccurateenoughtoproducepredicate-argumentrelationsreliably,so thatinferencehadnoplacetostart. • Inferenceprocesseswerenotefficientenoughnoraccurateenough. • TherewasnolargeknowledgebasedesignedforNLUapplications. The first of these difficulties has been overcome by progress in statistical parsing. The secondproblemisonethatmanypeople,includingEkaterinaOvchinnikova,areworking on now. The research described in this volume addresses the third difficulty, and indeed showsconsiderablepromiseinovercomingit.Forthisreason,IbelieveDr.Ovchinnikova’s workhasarealpotentialtoreigniteinterestininference-basedNLUinthecomputational linguisticscommunity. Akeynotioninherworkisthattherealreadyexistssufficientworldknowledgeinavariety ofresources,atalevelofprecisionthatenablestheirtranslationintoformallogic. Tomy mind,themostimportantoftheseareWordNetandFrameNet,especiallythelatter,andshe describesthekindofinformationonecangetoutoftheseresources.Sheexploitsinpartic- ularthehierarchicalinformationandtheglossesinWordNet,generating600,000axioms. ShealsodescribeshowonecanutilizeFrameNettogenerateabout50,000axiomsrepre- sentingrelationsbetweenwordsandframes, andabout5000axiomsrepresentingframe- framerelations. HeranalysisofFrameNetisquitethorough, andIfoundthispartofher workinspiring. ShealsocriticallydiscussesfoundationalontologiessuchasDOLCE,SUMO.andOpen- Cyc, anddomain-specificontologiesofthesortbeingconstructedfortheSemanticWeb. v vi IntegrationofWorldKnowledgeforNaturalLanguageUnderstanding Sheexaminestheproblemsraisedbysemi-formalontologies,likeYAGOandConceptNet, whichhavebeengleanedfromtextorNetizensandwhichmaybemoredifficulttotrans- lateintoformallogic. Shealsoshowshowtousedistributionaldataforadefaultmodeof processingwhentherequiredknowledgeisnotavailable. Heruseofknowledgefromavarietyofresources,combinedintoasinglesystem,leadsto theveryhardproblemofensuringconsistencyinsuchaknowledgebase. Sheengagesin averyclosestudyofthekindsofconceptualinconsistenciesthatoccurinFrameNetand in Description Logic ontologies. She then provides algorithms for finding and resolving inconsistenciesintheseresources.Ifoundthispartofherworkespeciallyimpressive. Sheexaminesthreeformsofinference–standarddeduction,weightedabduction,andrea- soningindescriptionlogics,explicatingthestrengthsandweaknessesofeach. Finallysheevaluatesherworkoninference-basedNLUbyapplyingherreasoningengines totheRecognizingTextualEntailmentproblem.SheusestheRTE-2testsetandshowsthat herapproach,withnospecialtuningtotheRTEtask,achievesstate-of-the-artperformance. She also evaluates her approach, with similarly excellent results, on the Semantic Role Labeling task and on paraphrasing noun-noun dependencies, both of which fall out as a by-productofweightedabduction. So the research described here is very exciting indeed. It is a solid achievement on its ownanditpromisestoopendoorstomuchgreaterprogressinautomaticnaturallanguage understandingintheverynearfuture. JerryR.Hobbs InformationSciencesInstitute UniversityofSouthernCalifornia MarinadelRey,California Acknowledgments TheresearchpresentedinthisbookisbasedonmyPhDthesis,thatwouldnothavebeen possiblewithoutthehelpofmanypeople.InthefirstplaceIwouldliketothankmythesis advisorKai-UweKu¨hnbergerwhohasscientificallyandorganizationallysupportedallmy researchadventuresgivingmefreedomtotrywhateverIthoughtwasinteresting. IamindebtedtoJerryHobbswhohasinvitedmetovisittheInformationSciencesInstitute where I have spent the most productive six months of my dissertation work. Jerry has introducedmetotheexcitingfieldofabductivereasoningandencouragedmetocombine thisapproachwithmyresearchefforts,whichturnedtobehighlysuccessful. IowemydeepestgratitudetoFrankRichterwhohassupportedmefromtheverybeginning ofmyresearchcareer.WheneverIneededscientificadviceororganizationalsupport,Frank wasalwaystheretohelp. MygratitudeespeciallygoestotheISIcolleagues.Iparticularlybenefitedfromdiscussions withEduardHovy. ThankstoRutuMulkar-Mehtawhohasdevelopedandsupportedthe Mini-TACITUS system, I managed to implement the extensions to the system that many ofmyresearchresultsarebasedupon. IverymuchthankNiloofarMontazeriwhoshared withmethetediousworkonrecognizingtextualentailmentchallenge. IthankNicolaGuarinoforgivingmeanopportunitytospendacoupleofweeksattheLab- oratoryofAppliedOntology. ManythankstotheLOAcolleaguesLaureVieu,Alessandro Oltramari,andStefanoBorgoforafruitfulcollaborationonthetopicofconceptualconsis- tency. Thefollowinggratitudesgototheresearchersfromallaroundtheworldwhohavedirectly contributedtothiswork.IverymuchthankTonioWandmacherforbeingmyguideintothe worldofdistributionalsemanticsandforconstantlychallengingmytrustininference-based approaches.IamgratefultoJohanBos,thedeveloperoftheBoxerandNutcrackersystems, who helped me to organize experiments involving these systems. I would like to thank vii viii IntegrationofWorldKnowledgeforNaturalLanguageUnderstanding AnselmoPen˜asforcollaboratingwithmeontheissueofparaphrasingnoundependencies. IthankMichaelMcCordformakingtheESGsemanticparseravailableformyexperiments. IthankHelmarGustwhoagreedtowriteareviewofmythesis. Concerningthefinancialside,IwouldliketothanktheGermanAcademicExchangeser- vice(DAAD)foraccordingmeathreeyeargraduatescholarship. IalsothanktheDoctor- ateProgrammeattheUniversityofOsnabru¨ckforsupportingmyconferenceandscientific tripsfinancially. Iwould liketothank Johannes Dellert, Ilya Oparin, UlfKrumnack, Konstantin Todorov, andSaschaAlexeyenkoforvaluablecomments,hints,anddiscussions. Specialthanksto Ilyaforkeepingaskingmewhenmythesiswasgoingtobefinished. IamgratefultoIrinaV.Azarovawhogavemeafeelingofwhatcomputationallinguistics reallyis. IexpressmyparticulargratitudetomyparentsAndreyandElenafortheircontinuedsup- portandencouragement,whichIwasalwaysabletocounton. Finally, I sincerely thank my husband Fedor who has greatly contributed to the realiza- tion of this book. Thank you for valuable discussions, introduction into statistical data processing, manifold technical and software support, cluster programming necessary for large-scaleexperiments,andallotherthings,whichcannotbeexpressedbywords. E.O.,November2011,LosAngeles Contents Foreword v Acknowledgments vii ListofFigures xiii ListofTables xv ListofAlgorithms xvii 1. Preliminaries 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 HowtoReadThisBook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2. NaturalLanguageUnderstandingandWorldKnowledge 15 2.1 WhatisNaturalLanguageUnderstanding?. . . . . . . . . . . . . . . . . . . . . . 15 2.2 RepresentationofMeaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2.1 MeaningRepresentationinLinguisticTheories. . . . . . . . . . . . . . . 19 2.2.2 LinguisticMeaninginArtificialIntelligence . . . . . . . . . . . . . . . . 26 2.3 SharedWordKnowledgeforNaturalLanguageUnderstanding . . . . . . . . . . . 30 2.3.1 Linguisticvs.WorldKnowledge . . . . . . . . . . . . . . . . . . . . . . 31 2.3.2 NaturalLanguagePhenomenaRequiringWordKnowledgetobeResolved 33 2.4 ConcludingRemarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3. SourcesofWorldKnowledge 39 3.1 Lexical-semanticResources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.1.1 Hand-craftedElectronicDictionaries . . . . . . . . . . . . . . . . . . . . 46 3.1.2 AutomaticallyGeneratedLexical-semanticDatabases . . . . . . . . . . . 53 3.2 Ontologies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.2.1 FoundationalOntologies . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.2.2 Domain-specificOntologies . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.3 MixedResources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.3.1 OntologiesLearnedfromText. . . . . . . . . . . . . . . . . . . . . . . . 65 3.3.2 OntologiesLearnedfromStructuredSources:YAGO . . . . . . . . . . . 66 3.3.3 OntologiesGeneratedUsingCommunityEfforts:ConceptNet . . . . . . . 67 3.4 ConcludingRemarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 ix
Description: