ebook img

A Real World Implementation of Answer Extraction PDF

6 Pages·0.08 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview A Real World Implementation of Answer Extraction

A Real World Implementation of Answer Extraction∗ DiegoMolla´ Aliod,JawadBerri, MichaelHess ComputationalLinguisticsGroup Department ofComputerScience, UniversityofZurich Winterthurerstr. 190,CH-8057 Zurich, Switzerland 0 {molla,berri,hess}@ifi.unizh.ch 0 0 2 n Abstract over it turned out to be extremely costly to port the sys- a temfromonedomaintoanother,closelyrelated,one(many J InthispaperwedescribeExtrAns,ananswerextraction person-monthsofwork).Infactthesetypesofsystemshave 4 system. Answerextraction(AE)aimsatretrievingthoseex- becomeprototypicalcasesofnon-scalablelaboratoryappli- 1 act passages of a document that directly answer a given cationswithverylimitedimpactonfurtherdevelopments. ] user question. AE is more ambitious than information re- When it comes to processing larger amounts of texts L trieval and information extraction in that the retrieval re- there have been around only two serious contenders up C sults are phrases, not entire documents, and in that the until now: Information Retrieval and Information Extrac- s. queriesmaybearbitrarilyspecific.Itislessambitiousthan tion. Unfortunately, both techniques have serious draw- c full-fledgedquestionansweringinthattheanswersarenot backs. Standard information retrieval (IR) techniques al- [ generatedfromaknowledgebasebutlookedupinthetextof low arbitraryqueriesoververylargedocumentcollections 1 documents. ThecurrentversionofExtrAnsisabletoparse (many gigabytes in size) covering arbitrary domains but v unedited Unix “man pages”, and derive the logical form they usually retrieve entire documents (this holds true for 0 of their sentences. User queries are also translated into traditionalsystemssuchasSMART[9]aswellasforprob- 1 0 logicalforms. Atheoremproverthenretrievestherelevant abilistic ones such as SPIDER [11]). However, this is un- 1 phrases,whicharepresentedthroughselectivehighlighting helpfulifdocumentsaredozens,orhundreds,ofpageslong. 0 intheircontext. Sometimes the techniques of IR are also used to retrieve 0 individualpassages of documents(one or more sentences, 0 paragraphs)(cf. [10]). Insuch cases the numberof search / s termsfoundinagivensentence,togetherwiththeirdensity c 1. Answer Extraction: The Core Idea : (intermsofclosenessinasentence),isusedtofindrelevant v sentences. i X One ofthe fieldswherenaturallanguageunderstanding Unfortunately,allIRtechniques(whetherappliedtoen- technology has failed to deliver convincing results is text r tiredocumentsortoindividualpassages)haveanumberof a based question answering. Systems which read texts, as- limitationsthatmakethemunsuitableforcertainimportant similatetheircontent,andanswerfreelyphrasedquestions applications. First, theytakeintoaccountonlythecontent about them would be very useful in a wide variety of ap- words of a document (all the function words are thrown plications, particularly so if questions and texts could be away).Second,inmostcasesonlythestemofsuchwordsis writteninunrestrictedlanguage.Theywouldbetheperfect used(andthisstemisusuallynotderivedbyapropermor- solutiontothe problemof informationoverloadin the age phologicalanalysisbutbymeansofsomekindofstemmer of the World Wide Web. However, as the situation is to- algorithm,inevitablyresultinginnumerousspuriousambi- day(andwillremainforalongtimetocome),suchsystems guities). Finally, andmostimportantly,theresultingterms can be implemented only in very small domains, for ex- aretreatedasisolateditemswhoseunorderedcombination tremelysmallamountsoftext,andwithveryhighdevelop- is used as content model of the original document. This mentcosts. Onesuchsystem, LILOG([4]),absorbedwell holdsforBooleansystemsaswellasforvectorspacebased inexcessof60person-yearsofworkandcould,initsfinal systems. Inevitably,neithermodelcan,assuch,distinguish stage, still treat merely a few dozen pages of text. More- theconceptof“computerdesign”fromthatof“designcom- ∗This research is funded by the Swiss National Science Foundation, puter”(lostorderinginformation),ortheconceptof“export projectnr.1214-45448-95 fromGermanytotheUK”fromthatof“exportfromtheUK toGermany”(lostfunctionwordinformation). True, most recall is vital (technical manuals typically explain things systems can use phrasal search terms (such as ”computer only once), and sometimes retrieval time is mission criti- design”),tobefoundasawholeinthedocuments,butthen cal(retrievinginformationaboutasystem abouttogetout a numberof relevantdocuments(such as those containing of control). What is needed in such situations is a system ”designofcomputers”)willnolongerberetrieved.Allthis thatpinpointstheexactphrase(s)inadocument(collection) alsoholdsforthe(few)passageretrievalsystemsdescribed fromwhosemeaningwe caninferthe answertoa specific intheliterature(suchasthesystemdescribedin[10]). question. ThisisthecoreideaofAnswerExtraction(AE). Information extraction (IE) techniques do not suffer Sinceweneedtodeterminethemeaningofsentences(ques- from the same shortcomings. They are similar to IR sys- tionsandtexts)wemustuse(alimiteddegreeof)linguistic temsinthatthey,too, are suitableforscreeningverylarge (syntactic and semantic) information, which is expensive, textcollections(ofbasicallyunlimitedsize,suchasstreams but on the other hand the texts to be processed are mod- of messages) covering a potentially wide range of top- eratelysized(somehundredsofkilobytes,sometimesafew ics. However,theydifferfromIRsystems in thattheynot megabytes),andtheytypicallycoveraverylimiteddomain. onlyidentifycertainmessagesinsuchastream(thosethat This makes Answer Extraction a realistic compromise be- fall into a number of specific topics) but also extractfrom tween full question answering on the one hand, and mere thosemessageshighlyspecificcontentdata. Typicalexam- informationextractionorinformationretrievalontheother. plesarenewswirereportsdescribingterroristattacks(where We will describe an Answer Extraction system, theyextractthe informationas to whoattackedwhomand “ExtrAns”, for questions about (a subset of) the on-line how and when, what was the outcome of the attack etc.) Unix manual (the so-called “man pages”). Although the or newswire reports on management succession events in system is, for the time being, functional only as a proto- newswirebusinessreports(withdataonwhoresignedfrom typeitcancopewithuneditedtextandarbitraryquestions, whatpost in which company,who is successor etc.). This withperformancedegradinggracefullyifinput(documents predefined information is placed into a template, or data orquestions)cannotbeanalysedcompletely.Itisincremen- base record,definedforthe differentrolefillers of a given tallyextensibleinthesensethatrefinementsofthegrammar typeofreport. and/orthesemanticcomponentautomaticallyimprovepre- cisionandrecall,withouttheneedtochangeanyothercom- Clearly, this kind of information is much more precise ponentsofthesystem. andspecificthanwhatisconsideredbyIRsystems. Onthe otherhand,IEsystemsdonotallowforarbitraryquestions (asIRsystemsdo). Theymerelyallowforasmallnumber 2.Requirements and components ofpre-definedinformationframesto be filled. Worse still, the “Message Understanding Conferences”, which have GiventhefactthatanAnswerExtractionsystemshould beendrivingdevelopmentinthisareasince1987(thelatest beabletocopewithunrestrictedtext,itneedsaveryreliable so far, with published proceedings, is MUC-6 [2]), put so tokeniser, a grammar of considerable coverage, a reason- muchemphasisonverylargetextvolumesthatmostofthe ablyrobustparser,somewayofdealingwithambiguities,a participatingsystemsthathadused,atfirst,athoroughlin- modulethatcansubjectevenfragmentsofsyntaxstructures guisticanalysishadtoabandonitandadoptaveryshallow toasemanticanalysis,andasearchenginecapableofusing approachinstead,simplybecauseofrun-timerequirements theresultingknowledgebase. forsuch volumesof data (e.g. [1]). Thisapproach,which In the following we will describe merely three compo- isnowtakenbymostsystemstakingpartinMUCs,makes nents of the system in some detail. First, we will point thesystemsincreasinglylessgeneral. outthatpreprocessingtechnicallanguagegoeswellbeyond However,thereisagrowingneedtodayforsystemsthat whatatypicaltokeniserdoes.Wewillnotdescribethesyn- arecapableoflocatinginformationintextsnotrunninginto taxanalysismoduleforwhichweuseandextendanexisting the gigabytes but which should show very high precision dependencyorientedsystemthatcomeswithafullformlex- and recall and which should furthermoreallow arbitrarily icon,agrammar,andaparser,viz. SleatorandTemperley’s phrased questions. Moreover they should be able to cope “LinkGrammar”[12].Ithascertainbuilt-incapabilitiesfor with documents written in syntactically unrestricted natu- robustparsing,whichwesupplementbyafall-backstrategy ral language whereas the domain of the texts is normally thatturnsunrecognisedconstituentsintokeywords(thusre- quiterestricted. Examplesforsuchsystemsareinterfaces sortingtoanIR-typebehaviour). Second,wewilldescribe to machine-readable technical manuals, on-line help sys- the design principles for the semantic representations de- tems for complexsoftware, help desk systems in large or- rivedfromthe(veryspecifictypeof)syntaxstructurespro- ganisations,andpublicinquirysystemsaccessibleoverthe ducedbyLinkGrammar. We willnotexplainindepththe Internet. Forthesetasks,veryhighprecisionofretrievalis disambiguationmodule,forwhichweadoptandextendthe mandatory(queriesmaybeveryspecific),oftennearperfect approach put forward by Brill and Resnik [3]. Third, we willshowhowwecopewiththesyntacticambiguitiesthat This type of information is extracted from the format- surviveall our disambiguationactivities. We will also ex- ting commands and added to the tokens for later modules plainthesearchstrategyverybriefly. touse(e.g. “eject”,whenusedasthenameofacommand, isturnedinto“eject.com”,and“filename”,whenusedasan 3. Preprocessing technical language argument,into“filename.arg”). 3.2.Documentstructureanalysis The analysis of technical language is, in general, con- siderably simpler than that of domain unspecific language The formattinginstructionsin the Unix man pages are, (newspapersetc.)butasfaraspreprocessingisconcernedit unfortunately, used in a fairly unsystematic fashion (these isfarmoredemanding.Thisholds,inparticular,fortokeni- pageswere written by dozensof differentpersons). In or- sation,normalisation,anddocumentstructureanalysis. dertoextractadditionalinformationabouttokensfromthe formatting(seeabove),thetokenisermustmakeupforthese 3.1.Tokenisationandnormalisation inconsistenciesin the source texts. It doesso by perform- ing a considerableamountof documentstructureanalysis. In general, tokenising a text means merely identifying It has, for instance, to collect all command names and ar- word forms and sentences. However, in highly technical gument names from the SYNOPSIS and NAME sections documentssuch as the Unix man pages, this may become ofeachmanualpagetobesurethatitwillrecogniseallof a formidable task. Apart from regular word forms, the theminthebodyoftheDESCRIPTIONsection,eveniffor- ExtrAns tokeniser has to recognise all of the following as mattedincorrectly.Thus,processingamanpagebecomesa tokensandrepresentthemasnormalisedexpressions: caseofprocessingeachofitssectionsinaparticularway. Commandnames: eject,nice(problem:identifyregular wordswhenusedasnamesofcommandsinsentenceslike 4.Logical forms “ejectisusedfor...”,asopposedtotheirstandarduse,asin “Itisnotrecommendedtophysicallyejectmedia...”). A major property of ExtrAns is that the textual infor- Pathnamesandabsolutefilenames: /usr/bin/X11; mation is converted into existentially closed formulae of usr/5bin/ls,/etc/hostname.le(problems:leading,trailing logic,whichencodethemaincontentrelationshipsbetween andinternalslashes,numbersandperiods). the words of the sentences. All formulae are existentially Optionsofcommands: -C,-ww,-dFinUv(problem: quantified since, for retrieval purposes, all entities men- identifywhereasequenceprecededbyadashisanoption tioned in a document can be assumed to exist, as generic andwherenot,asin“... whosenameendswith.gz,-gz,.z, entities, in the universe of discourse shared by writer and -z, zor.Zandwhichbegins...”). reader.Verbs,nouns,adjectives,andadverbsthusintroduce entitiesintheuniverseofdiscoursewhichcanbereferenced Namedvariables: filename1,device,nickname later.Inparticular,theverb“copy”in“cpcopiesgoodfiles” (problems:identifywordsusedasnamedvariables,mostly introduces, in the predicate evt(copy,e1,[c1,f1]), asargumentsofcommandsasin“... thefirstmmisthe anentitye1representingtheconceptofcopying. (e1can hournumber;ddistheday...”) beseenasareifiedeventualityinthesenseof[8]).Weapply Specialcharactersaspartsoftokens: AF UNIX, thisnotionofreificationtootherpredicatesalso. Thenoun sun path,ˆS(CTRL-S),KR,C++,name@domainor%, “cp”introducesapredicateobject(cp,o1,c1),where %%(asin: “Asingle%isencodedby%%.”),various c1representsthecommandcpitselfwhileo1standsforthe punctuationmarks(asin:“... correspondingtocat? or conceptofc1beingofthetypecp. Similarly,theadjective fmt?”,orin“/usr/man/man?”,“<signal.h>”,or “good”introduces, in prop(good,p1,f1),a new con- “[host!...host!]host!username”) cept,p1,viz. theconceptoff1havingthepropertygood. Theseadditionalentities(o1,p1)canbeusedtomodelin- Normalisingsuchtokensmeans,amongotherthings,to tensionalconstructionslike “X is an alleged copy”, where appropriatelymarkspecialtokenssuchascommandnames theconceptofX’sbeingacopyisqualified(asopposedto, (otherwise the parser chokes on them). Luckily, the Unix say,sayingthatXisacopyandXisalsoalleged),or“Yis man pages contain a considerable amount of useful infor- palegreen”,whereY’sbeinggreenismodified. mation beyond the purely textual level, namely the infor- As a basic knowledge representation language we use mationconveyedbytheformattingcommands. Thuscom- Horn Clause Logic, the subset of Predicate Logic which mandnamesare,asarule,printedinboldface,andexpres- can be directly handled by Prolog. We adopt a mixed- sionsusedasvariables,initalics,asin levelontology(largelyfollowing[5]): For eachmainverb compress[-cfv][-bbits][filename...] we create one fixed-arity predicate while its (obligatory) complements and (non-obligatory) modifiers result in ad- aslogicalimplicationbutasaregularpredicateif( , ). ditionalpredicates. The resultingexpressionsare supplied Thesameholdsfornegation,whichisintroducedasanother with pointers to the sentences and individual word forms regularpredicatenot( ). Furthersimplificationsarethat fromwhichtheywerederived. plurals,modality,tense,andquantificationareignored. As Thefollowingtwoexamplesillustrateanadditionaldis- a result, there is an obviousdecrease in precision, but the tinctionwemake: retrievalresults are perfectlysensible as a rule. In fact, in some cases, even if a retrieved sentence does not strictly (1) “cp copies the contents of filename1 onto file- answer the answer to the query, it does provide useful in- name2” formationtotheuser. Forexample,iftheuserasks“Which commandscan print warning messages?” (note the plural (2) “Iftheoperationfails,ejectprintsamessage” and the modal), the system can easily retrieve (2), which containsa conditionalwhose antecedentcannotbe proven In the first examplethe copyingeventis asserted to ac- to hold, but still the sentence by itself is useful as an an- tuallyhold(inthe genericworldofmanpages),butthisis swer. Hadweusedamoredetailedlogicalform,thesystem notthecaseforeitheroftheactionsinthesecondexample wouldhaveto resortto possiblycomplexinferencesinor- (failing and printing) since both of them are introducedin dertoretrievethesameresult. Researchinthisareaisnot the scope of the conditional. Clearly we want passage (2) yetfinished,however,anditispossiblethatmorecomplex to be retrievable for appropriate queries (see example be- logicalformsareusedinfurtherprototypesofExtrAns. low) but presumablywe do not want it to be shown when Finally,notealsothatwehaveusedinthisrepresentation weaskaboutwaysfortheusertointentionallyprintames- someoftheinformationextractedbythetokeniserfromthe sage, as opposed to the printing of error messages, which typography (and, indirectly, from the document structure) is merely a side-effect. For this reason we further extend of the text. The fact that “eject” is the name of a com- the ontological basis of our system concerning eventuali- mand (rather than a regular English word) results in the ties and state that those eventualities that actually hold in creation of an additionalentry object(command,x7), theworldareexplicitlymarkedassuch. Allothersarepre- andsimilarly“cp”isrecognisedasreferringtoacommand, sumedtomerelyexistintheuniverseofdiscourse.1 Weget too. Without recourse to this type of information the first thus,for(1),thefollowingrepresentation2 sentencewouldbecomeunavailableforquerieslike“What commands copy files?”, and the second sentence would holds(e1)/s1. object(cp,o1,x1)/s1. havetobeconsideredungrammatical. object(command,o2,x1)/s1. In orderto answer questionsthe standardsearch proce- evt(copy,e1,[x1,x2])/s1. dureofrefutationresolution(asimplementedinProlog)is object(content,o3,x2)/s1. nowusedtofindallproofsofthequeryovertheknowledge object(filename1,o4,x3)/s1. base. Thequery object(file,o5,x3)/s1. of(x2,x3)/s1. object(filename2,o6,x4)/s1. (3) Whichcommandcopiesfiles? object(file,o7,x4)/s1. onto(e1,x4)/s1. thusbecomes,aftercomputingthelogicalformandcheck- wherethecopyingeventissaidtohold(holds(e1)),and, ingforsynonyms(seenextsection) for(2) ?- findall(S,(object(command,,X)/S, object(eject,o8,x7)/s2. (evt(copy,E,[X,Y])/S; object(command,o9,x7)/s2. evt(duplicate,E,[X,Y])/S; evt(print,e8,[x7,x11])/s2. object(file,,Y)/S), R). object(message,o10,x11)/s2. andreturns,inR=[S1,S2,...],thereferencestotherel- if(e8,e5)/s2. evantsentences,oneofwhichis(1). object(operation,o11,x5)/s2. evt(fail,e5,[x5])/s2. 5.A fall-back searchstrategy whereneitherthefailingnortheprintingevent(e5ande8) is marked that way. Actual existence may be inferred or Another important feature of ExtrAns is the linguisti- blockedorletunspecified,accordingtocontext[6]. cally aware fall-back search strategy it uses. For that ef- As we can see above, the logical forms generated are fect,wearedevelopingacustom-made,WordNet-stylethe- simplified. The conditional, for example, is not encoded saurus [7] which contains two types of relations between 1Notethatwedonotdothisyetforobjectsandproperties,though. the concepts in the world of Unix man pages: synonyms 2Thepointerstowordpositionsarenotshownhere. andhyponyms. Theoverallsearchalgorithmisasfollows. First, allthe which are retrieved several times are highlighted with a synonyms of all the content words used in the query are brighter colour than others. For example, consider the added to the query from the start. If the query does not string “create the destination directory” in the first hit in return enough answers, hyponyms of all search terms will figure1 (“install.1/DESCRIPTION/1”). Thisstring is part be used. If this, too, gives us an insufficient number of of all interpretations of the (ambiguous) sentence in “in- hits,allthelogicaldependenciesbetweentermsarebroken. stall.1/DESCRIPTION/1” and was thus used for all the Atthisstage, a querylike “How canI createa directory?” proofs of the user query. As a consequence, it is high- wouldretrieve“mkdircreatesanewdirectoryfile”. Finally, lighted with highest intensity in the answer window. This ifeverythingelse fails, we gointokeywordsmode. Inthis means that, in formal terms, we interpret unambiguity as mode,nouns, verbs, adjectivesandadverbsare selected in relevance. Unconventionalasthis maybe, it seemsa very boththe queryandthe data. Then, everywordselected in helpfulconceptinthefaceofunreducibleambiguities. the query is matched against those in the document. The In addition, it is possible to access the complete man- resultisdisplayedaccordingtothenumberofwordsinthe ual page containing the sentence by clicking on the man- sentencethatmatchit. Notethatthiskeywordsmodeisstill ual page name at the left of each sentence. The manual more powerfulthan the standard IR approachsince (i) we pagewillshowthesamemulti-colouredselectivehighlight- makegooduseofthepart-of-speechofthewordstodecide ing, thus enablingthe user to spotthe relevantsentence at whatis elligible as a keyword, (ii) we make use of the in- once and to determine even better, by inspecting the con- formationobtainedbythetokeniserfromtheformattingof text,whetherthesentencecontainsinfactananswertothe thedocument(suchasnamesofcommands,typesofargu- question.Thiswayofpresentingsearchresultsmakeseven ments,etc.),and(iii)wealsomakeuseofthetextualstruc- multipleambiguitiesfairlyunobtrusive. ture of the document(all index terms have to occur in the samesentenceinordertobeconsidered). 7.Conclusions and further research 6. Presenting (possibly ambiguous)results Comparisonwithexistingapproaches. Wehavebuilta small prototype that currently processes 30 UNIX manual A problem that our system (as every NLP system) had pagesandallowstheusertoaskquestionsinplainEnglish. to confront was that of ambiguities. Sleator and Temper- Sincetheamountofdataisstillsmall,astatisticallymean- ley’s[12]parserdoesnottrytoresolvesyntacticorseman- ingfulevaluationisoutofthequestion. Moreover,itisun- tic ambiguities, and a long sentence may have hundreds, clearhow we shouldcomparethe performanceof oursys- or even thousands, of different parses. ExtrAns tries to tem with that of standard IR systems. The standard mea- resolve (syntactic) ambiguities in two steps. First of all, suresofrecallandprecisionforthosesystemsarebasedon somehand-craftedrulesfilter outthe moststraightforward experts’judgementconcerningtherelevanceofentiredoc- cases of spurious ambiguities. An example of such a rule uments whereas, in ExtrAns, we would have to determine is: “A prepositional phrase headed by of can attach only the relevance of individualphrases. However, in an infor- to the immediately precedingnoun or noun coordination.” malmannerwecancompareourapproachandthestandard In a second step, we adopt Brill and Resnik’s [3] prepo- IRapproachbytelling ExtrAnsto gointo keywordsmode sitionalphrasedisambiguationapproach,trainedwith data fromtheverybeginning.Itwouldthenregularlyfindacon- extractedfrommanualpages. Withthehelpofthisdisam- siderablenumberofpassageswhicharefarfromrelevantto biguatorwe can use statistical data to resolve some of the the query. For example, the query used in figure 1, “How prepositionalphraseattachmentambiguities,amajorsource canIcreateadirectory?”,wouldnowfind(inadditiontoall ofstructuralambiguity. therelevantpassages)sentencessuchas“ln createsan ad- After these two steps, the number of ambiguities will ditionaldirectoryentry,calledalink,toafileordirectory.” bereducedbutsomewillnormallysurvive,partlybecause (which is not about the creation of directories) as well as there are sources of ambiguities which cannot be treated the equally irrelevant “A hard link is a standard directory with Brill and Resnik’s algorithm. If a sentence has sev- entryjustliketheonemadewhenthefilewascreated.” Ig- eral irreducible interpretations, ExtrAns stores all of them noringthesyntactic, andhencesemantic, relationshipsbe- initsdatabase. Whentheuserasksaquery,the samesen- tween the individualwordsresulted, predictably, in a con- tencemaythereforeberetrievedseveraltimes,viadifferent siderablelossinprecision. Sinceevenourkeywordsmode proof paths. The different proofs may result in the high- isfarmorerestrictivethanthestandardIRsearchmodel(see lighting of differentwords in the same sentence. ExtrAns above),anystandardIRsystemisboundtoshowconsider- handles this mismatch of highlighted words by superim- ablylowerprecisionthanourAEapproach. posing all of the highlights of a given sentence, using a In sum, we hope to haveshown that the concept of an- graded colouring scheme in such a way that those parts swer extraction is very useful, and that it requires a rela- Figure1.Theresultofthequery“HowcanIcreateadirectory?” (originalincolour) tivelylimitedamountoflanguageprocessing. Wethinkwe [3] E. Brill and P. Resnik. A rule-based approach to preposi- couldalsoshowthatitissurprisinglyeasyforuserstocope tional phrase attachment disambiguation. In Proceedings with ambiguities in documents if they are fused, graded, ofthe15thInternationalConferenceonComputationalLin- andpresentedincontext. guistics(COLING’94),volume2,pages1998–1204,Kyoto, Japan,1994. Featurestoimprove. Forthesystemtobetrulyusefulwe [4] O.HerzogandC.R.Rollinge,editors. TextUnderstanding must clearly increase the number of manual pages which inLILOG. Springer-Verlag,Berlin,1991. can be analysed. We must then refine the system’s treat- [5] M. Hess. Mixed-level knowledge representations and ment of ungrammatical text. Currently, it converts words variable-depth inference in natural language processing. which cannot be analysed into isolated keywords, and we International Journal on Artificial Intelligence Tools, should refine this method so that we can process individ- 6(4):481–509,1997. [6] J.R.Hobbs. Ontologicalpromiscuity. InProceedingsofthe ual phrases if a sentence cannot be parsed in its entirety. 23rdAnnualMeetingoftheAssociationforComputational Also,insomesentences,thenumberofunreduciblesyntac- Linguistics, pages 61–69. University of Chicago, Associa- tic analysesstill is overwhelming,and itwill be necessary tionforComputationalLinguistics,1985. torefineourmethodstodisambiguateandfilterouttheim- [7] G.A.Miller,R.Beckwith,C.Fellbaum,D.Gross,andK.J. plausible meanings, possibly through the use of semantic Miller. Introductiontowordnet:anon-linelexicaldatabase. information.Wemustalsoaddsomecapabilityofresolving InternationalJournalofLexicography,3(4):235–244,1990. pronounsandanaphoricfullnounphrases.Finally,wemust [8] T.Parsons. EventsintheSemanticsofEnglish: Astudyin extendthecurrentinferencingtechniquestoincreaserecall. subatomicsemantics. MITPress,Cambridge,MA,1990. We currentlyintegratesynonymy,hyponymyandconjunc- [9] G.Salton. AutomaticTextProcessing: thetransformation, analysis,andretrievalofinformationbycomputer.Addison tion distributivity, among others, but we still need to add Wesley,NewYork,1989. moreinferencesandextendthethesaurus. [10] G.Salton,J.Allan,andC.Buckley. Approachestopassage retrieval in full text information systems. In ACM SIGIR References conferenceonR&DinInformationRetrieval,pages49–58, 1993. [11] P.Scha¨uble. SPIDER:amultiuserinformationretrievalsys- [1] D.E.Appelt,J.R.Hobbs,J.Bear,D.Israel,M.Kameyama, tem for semistructured and dynamic data. In ACM SIGIR andM.Tyson. SRI:descriptionoftheJV-FASTUSsystem conference on R&D in Information Retrieval, pages 318– used forMUC-5. InProceedingsof thefifthMessage Un- 327,1993. derstandingConference(MUC-5).MorganKaufmann,Los [12] D. D. Sleator and D. Temperley. Parsing English with a Altos,CA,1993. linkgrammar.TechnicalReportCMU-CS-91-196,Carnegie [2] ARPA. Proceedings of the Sixth Message Understanding MellonUniversity,SchoolofComputerScience,1991. Conference (MUC-6). Morgan Kaufmann, Los Altos, CA, 1996.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.