PhraseNet: Towards Context Sensitive Lexical Semantics∗ XinLi†,DanRoth†,YuanchengTu‡ Dept. ofComputerScience† Dept. ofLinguistics‡ UniversityofIllinoisatUrbana-Champaign {xli1,danr,ytu}@uiuc.edu Abstract prepositional phrase attachment (Pantel and Lin, 2000; Stetina and Nagao, 1997), co-reference resolution (Ng This paper introduces PhraseNet, a context- and Cardie, 2002) to text summarization (Saggion and sensitivelexicalsemanticknowledgebasesys- Lapalme, 2002), semantic information is a necessary tem. Based on the supposition that seman- component in the inference, by providing a level of ab- tic proximity is not simply a relation between stractionthatisnecessaryforrobustdecisions. two words in isolation, but rather a relation Inducingthattheprepositionalphrasein“They ate between them in their context, English nouns a cake with a fork” has the same grammatical and verbs, along with contexts they appear in, function as that in “They ate a cake with a areorganizedinPhraseNetintoConsets; Con- spoon”, for example, depends on the knowledge that setscapturetheunderlyinglexicalconcept,and “cutlery” and “tableware” are the hypernyms of both are connected with several semantic relations “fork” and “spoon”. However, the noun “fork” has five thatrespectcontextuallysensitivelexicalinfor- senses listed in WordNet and each of them has several mation. PhraseNet makes use of WordNet as differenthypernyms. Choosingthecorrectoneisacon- an important knowledge source. It enhances textsensitivedecision. aWordNetsynsetwithits contextualinforma- WordNet (Fellbaum, 1998), a manually constructed tionandrefinesitsrelationalstructurebymain- lexicalreferencesystemprovidesalexicaldatabasealong taining only those relations that respect con- with semantic relations among the lexemes of English textual constraints. The contextual informa- andiswidelyusedinNLPtaskstoday. However,Word- tion allows for supporting more functionali- Net is organized at the word level, and at this level, En- ties compared with those of WordNet. Nat- glish suffers ambiguities. Stand-alone words may have ural language researchers as well as linguists severalmeaningsandtakeonrelations(e.g.,hypernyms, andlanguagelearnerscangainfromaccessing hyponyms)thatdependontheirmeanings.Consequently, PhraseNetwithawordtokenanditscontext,to there are very few success stories of automatically us- retrieverelevantsemanticinformation. ing WordNet in natural language applications. In many We describe the design and construction of cases,reported(andunreported)problemsareduetothe PhraseNet and give preliminary experimental fact that WordNet enumerates all the senses of polyse- evidencetoitsusefulnessforNLPresearches. mouswords; attemptstousethisresourceautomatically often result in noisy and non-uniform information (Brill andResnik,1994;KrymolowskiandRoth,1998). 1 Introduction PhraseNet is designed based on the assumption that, Progress in natural language understanding research ne- by and large, semantic ambiguity in English disappears cessitates significant progress in lexical semantics and when local context of words is taken into account. It the development of lexical semantics resources. In makesuseofWordNetasanimportantknowledgesource a broad range of natural language applications, from and is generated automatically using WordNet and ma- chinelearningbasedprocessingoflargeEnglishcorpora. ∗Research supported by NSF grants IIS-99-84168, ITR-IIS-00-85836 and an ONR MURI award. ItenhancesaWordNetsynsetwithitscontextualinforma- Namesofauthorsarelistedalphabetically. tionandrefinesitsrelationalstructure,includingrelations Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE 3. DATES COVERED 2003 2. REPORT TYPE 00-00-2003 to 00-00-2003 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER PhraseNet: Towards Context Sensitive Lexical Semantics 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION University of Illinois,Department of Computer Science ,Urbana,IL,61801 REPORT NUMBER 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S) 11. SPONSOR/MONITOR’S REPORT NUMBER(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution unlimited 13. SUPPLEMENTARY NOTES 14. ABSTRACT 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18. NUMBER 19a. NAME OF ABSTRACT OF PAGES RESPONSIBLE PERSON a. REPORT b. ABSTRACT c. THIS PAGE 8 unclassified unclassified unclassified Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 suchashypernym,hyponym,antonymandsynonym,by quency records of each word in its context to help dif- maintainingonlythoselinksthatrespectcontextualcon- ferentiateconsetsandmakesuseofdefinedsimilaritybe- straints. However, PhraseNetisnotjustafunctionalex- tweencontextsinthisprocess2. tensionofWordNet.Itisanindependentlexicalsemantic Several access functions are built into PhraseNet that systemalliedwithproperuserinterfacesandaccessfunc- allow retrieving information relevant to a word and its tions that will allow researchers and practitioners to use context. Whenaccessedwithwordsandtheircontextual itinapplications. information,thesystemtendstooutputmorerelevantse- Asstatedbefore,PhraseNet,isbuiltontheassumption manticinformationduetotheconstraintsetbytheirsyn- thatlinguisticcontextisanindispensablefactoraffecting tacticcontexts. the perception of a semantic proximity between words. While still in preliminary stages of development and Initscurrentdesign,PhraseNetdefines“context”hierar- experimentation and with a lot of functionalities still chically with three abstraction levels: abstract syntactic missing,webelievethatPhraseNetisanimportanteffort skeletons,suchas towardsbuildingacontextuallysensitivelexicalsemantic resource, that will be of much value to NLP researchers [(S)−(V)−(DO)−(IO)−(P)−(N)] aswellaslinguistsandlanguagelearners. which stands for Subject, Verb, Direct Object, Indi- The rest of this paper is organized as follows. Sec. 2 rectObject, PrepositionandNoun(Object)ofthePrepo- presents the design principles of PhraseNet. Sec. 3 de- sition, respectively; syntactic skeletons whose compo- scribes the construction of PhraseNet and the current nents are enhanced by semantic abstraction, such as stage of the implementation. An application that pro- [Peop−send−Peop−gift−on−Day] and fi- videsapreliminaryexperimentalevaluationisdescribed nallyconcretesyntacticskeletonsfromrealsentencesas inSec.4.Sec.5discusessomerelatedworkonlexicalse- [they−send−mom−gift−on−Christmas]. mantics resources and Sec. 6 discusses future directions Intuitively,while“candle”and“cigarette”wouldscore withinPhraseNet. poorlyonsemanticsimilaritywithoutanycontextualin- 2 TheDesignofPhraseNet formation, their occurrence in sentences such as “John tried to light a candle/cigarette” may ContextisoneimportantnotioninPhraseNet. Whilethe highlight their connection with the process of burning. context may mean different things in natural language, PhraseNet captures such constraints from the contextual manypreviousworkinstatisticallynaturallanguagepro- structuresextractedautomaticallyfromnaturallanguage cessing defined “context” as an n-word window around corporaandenumerateswordlistswiththeirhierarchical the target word (Gale et al., 1992; Brown et al., 1991; contextualinformation. Severalabstractionsaremadein Roth,1998). InPhraseNet,“context”hasamoreprecise the process of extracting the context in order to prevent definitionthatdependsonthegrammaticalstructureofa superfluousinformationandsupportgeneralization. sentenceratherthansimplycountingsurroundingwords. The basic unit in PhraseNet is a conset, a word in its We define “context” to be the syntactic structure of the context, togetherwithallrelationsassociatedwithit. In sentence in which the word of interest occurs. Specif- thelexicaldatabase,consetsarechainedtogetherviatheir ically, we define this notion at three abstraction levels. similarorhierarchicalcontexts. Bylistingeverycontext Thehighestlevelistheabstractsyntacticskeletonofthe extractedfromlargecorporaandallthegeneralizedcon- sentence. Thatis,itisintheformofthedifferentcombi- texts based on those attested sentences, PhraseNet will nations of six syntactic components. Some components havemuchmoreconsetsthansynsetsinWordNet. How- may be missing as long as the structure is from a legit- ever,theorganizationofPhraseNetrespectsthesyntactic imate English sentence. The most complete form of the structure together with the distinction of senses of each abstractsyntacticskeletonis: wordinitscorrespondingcontexts. For example, rather than linking all hypernyms of a [(S)−(V)−(DO)−(IO)−(P)−(N)] (1) polysemouswordtoasinglewordtoken,PhraseNetcon- which captures all of the six syntactic components such nects the hypernym of each sense to the target word in as Subject, Verb, Direct Object, Indirect Object, Prepo- everycontextthatinstantiatesthatsense. WhileinWord- Net every word has an average of 5.4 hypernyms, in sition and Noun(Object) of Preposition, respectively, in the sentence. And all components are assumed to be PhraseNet, the average number of hypernyms of a word inaconsetis1.51. arranged to obey the word order in English. The low- est level of contexts is the concrete instantiation of the In addition to querying WordNet semantic relations statedsyntacticskeleton,suchas[Mary(S)−give(V)− to disambiguate consets, PhraseNet also maintains fre- John(DO)−gift(IO)−on(P)−birthday(N)] and 1The statistics is taken over 200,000 words from a mixed corpusofAmericanEnglish. 2SeedetailsinSec.3 [I(S)−eat(V)−bread(DO)−with(P)−hand(N)] whichareextracteddirectlyfromcorporawithgrammat- ical lemmatization done during the process. Therefore, all word tokens are in their lemma format. The middle layer(s) consists of generalized formats of the syntactic skeleton. Forexample,thefirstexamplegivenabovecan begeneralizedas[Peop(S)−give(V)−Peop(DO)− Possession(IO)−on(P)−Day(N)] by replacing someofitscomponentswithmoreabstractsemanticcon- cepts. PhraseNet organizes nouns and verbs into “consets” and a “conset” is defined as a context with all its corresponding pointers (edges) to other consets. The Figure1: ThebasicorganizationofPhraseNet:Theupward context that forms a conset can be either directly ex- arrowdenotestheHyperrelationandthedottedtwo-wayarrow tracted from the corpus, or at a certain level of ab- withaVabovedenotestheEqualrelationthatistransitivewith straction. For example, both [Mary(S) − eat(V) − regardtotheVcomponent. cake(DO) − on(P) − birthday(N),{p ,p ,...,p }] 1 2 n and [Peop(S) − eat(V) − Food(DO) − on(P) − Day(N),{p ,p ,...,p }]areconsets. 1 2 n Two types of relational pointers are defined currently tioncanclusteralistofwordswhichoccurinexactlythe in PhraseNet: Equal and Hyper. Both of these two re- samecontextualstructureandiftheextremecaseoccurs, lations are based on the context of each conset. Equal namelywhenthesamecontextinalltheseequalconsets is defined among consets with same number of compo- withregardtoaspecificsyntacticcomponentgroupsvir- nents and same syntactic ordering, i.e, some contexts tuallyanynounsorverbs,theHyperrelationcanbeused under the same abstract syntactic structure (the highest hereforfurtherdisambiguation. level of context as defined in this paper). It is defined Tosummarize,PhraseNetcanbethoughtofasagraph thattheEqualrelationexistsamongconsetswhosecon- on consets. Each node is a context and edges between textsarewithsameabstractsyntacticskeleton,ifthereis nodes are relations defined by the context of each node. only one component at the same position that is differ- They are either Equal or Hyper. Equal relation can be ent. Forexample,[Mary(S)−give(V)−John(DO)− derivedbymatchingconsetsanditiseasytoimplement gift(IO)−on(P)−birthday(N),{p ,p ,...,p }]and 1 2 n whilebuildingtheHyperrelationrequirestheassistance [Mary(S) − send(V) − John(DO) − gift(IO) − ofWordNetandthedefinedEqualrelation. Semanticre- on(P) − birthday(N),{p ,p ,...,p }] are equal be- 1 2 k lationsamongwordscanbegeneratedusingthetwotypes cause the syntactic skeleton each of them has is the ofdefinededges. Forexample,itislikelythatthetarget same, i.e., [(S)−(V)−(DO)−(IO)−(P)−(N)] words in all equal consets with transitivity have similar and except one word in the verb position that is differ- meaning.Ifthisisnottrueatthelowestlowerofcontexts, ent, i.e., ”give” and ”send”, all other five components itismorelikelytobetrueathigher,i.e.,moregeneralized at the corresponding same position are the same. The level.Figure1showsasimpleexamplereflectingthepre- Equal relation is transitive only with regard to a spe- liminarydesignofPhraseNet. cific component in the same position. For example, After we get the similar meaning lists based on their to be transitive to the above two example consets, the contexts, we can build interaction from this word list to Equal conset should be also different from them only WordNetandinheritothersemanticrelationsfromWord- by its verb. The Hyper relation is also defined for con- Net.However,eachmemberofawordlistcanhelptodis- sets with same abstract syntactic structure. For conset ambiguateothermembersinthislist. Therefore,itisex- A and conset B, if they have the same syntactic struc- pectedthatwiththepruningassistedbylistmembers,i.e., ture, and if there is at least one component of the con- thedisambiguationbytruncatingsemanticrelationsasso- text in A that is the hypernym of the component in that ciatedwitheachsynsetinWordNet,theextractmeaning of B at the corresponding same position, and all other in the context together with all other semantic relations components are the same respectively, A is the Hyper suchashypernyms,holonyms,troponyms,antonymscan conset of B. For example, both [Molly(S)−hit(V)− bederivedfromWordNet. Body(DO),{p ,p ,...,p }]and[Peop(S)−hit(V)− 1 2 j Body(DO),{p ,p ,...,p }] are Hyper consets of In the next two sections we describe our current im- 1 2 n [Molly(S)−hit(V)−nose(DO),{p ,p ,...,p }].The plementationoftheseoperationsandpreliminaryexperi- 1 2 k intuitionbehindthesetworelationsisthattheEqualrela- mentswehavedonewiththem. 2.1 AccessingPhraseNet thecorrespondingwordlistoccurringinthatcontext or a higher level of context. A parameter to this RetrievalofinformationfromPhraseNetisdoneviasev- function specifies if we want to get the complete eral access functions that we describe below. PhraseNet wordlistorthosewordsinthelistthatsatisfyaspe- is designed to be accessed via multiple functions with cificpruningcriterion. (Thisisthefunctionusedin flexible input modes set by the user. These functions theexperimentsinSec.4.) may allow users to exploit several different functionali- tiesofPhraseNet,dependingontheirgoalandamountof PN RL ismodelledaftertheWordNetaccessfunctions. resourcestheyhave. It will return all words in those contexts that are AnaccessfunctioninPhraseNethastwocomponents. linked in PhraseNet by their Equal or Hyper rela- The first component is the input, which can vary from tion. Those words can help to access WordNet to a single word token to a word with its complete con- derivealllexicalrelationsstoredthere. text. The second component is the functionality, which ranges over simple retrieval and several relational func- PN SN is modelled after the semantic concordance tions,modelledafterWordNetrelations. in (Landes et al., 1998). It takes a word token and The most basic and simplest way to query PhraseNet an optional context as input, and returns the sense iswithasingleword. Inthiscase,thesystemoutputsall ofthewordinthatcontext. SimilarlytoPN RLthis contexts the word can occur in, and its related words in function is implemented by appealing to WordNet eachcontext. senses and pruning the possible sense based on the PhraseNetcanalsobeaccessedwithinputthatconsists wordlistdeterminedforthegivencontext. ofasinglewordtokenalongwithitscontextinformation. PN ST isnotimplementedatthispoint,butisdesigned Context information refers to any of the elements in the to output a sentence that has same structure as the syntactic skeleton defined in Eq. 1, namely, Subject(S), inputcontext,butusedifferentwords. Itisinspired Verb(V),DirectObject(DO),IndirectObject(IO),Prepo- by the work on reformulation, e.g., (Barzilay and sition(P) and Noun(Object) of the Preposition(N). The McKeown,2001). contextual roles S, V, DO, IO, P or N or any subset of We can envision many ways users of PhraseNet can them,canbespecifiedbytheuserorderivedbyanappli- makeuseoftheretrievedinformation.Atthispointinthe cationmakinguseofashalloworfullparser. Themore lifeofPhraseNetwefocusmostlyonusingPhraseNetas information the user provides, the more specific the re- awaytoacquiresemanticfeaturestoaidlearningbased trievedinformationis. naturallanguageapplications. Thisdeterminesourprior- To ease the requirements from the user, say, in case itiesintheimplementationthatwedescribenext. no information of this form is available to the user, PhraseNetwill,inthefuture,havefunctionsthatallowa 3 ConstructingPhraseNet usertosupplyawordtokenandsomecontext,wherethe functionality of the word in the context is not specified. Constructing PhraseNet involves three main stages: (1) SeeSec. 6foradiscussion. extracting syntactic skeletons from corpora, (2) con- structingthecoreelementinPhraseNet: consets,and(3) FunctionName InputVariables Output developingaccessfunctions. PN WL Word[,Context] WordList PN RL Word[,Context] WordNetrelations The first stage makes use of fully parsed data. In PN SN Word[,Context] Sense constructing the current version of PhraseNet we used PN ST Context Sentence two corpora. The first, relatively small corpus of the 1.1 million-word Penn-State Treebank which consists Table 1: PhraseNet Access Functions: PhraseNet access of American English news articles (WSJ), and is fully functionsalongwiththeirinputandoutput.[i]denotesoptional parsed. Thesecondcorpushasabout5millionsentences input.PN RLisafamilyoffunctions,modelledafterWordNet oftheTREC-11(Voorhees,2002),alsocontainingmostly relations. AmericanEnglishnewsarticles(NYT,1998)andparsed withDekangLin’sminiparparser(Lin,1998a). Table1liststhefunctionalityoftheaccessfunctionsin Inthenearfutureweareplanningtoconstructamuch PhraseNet. If the user only input a word token without larger version of PhraseNet, using Trec-10 and Trec-11 anycontext,allthosedesignedfunctionswillreturneach datasets,whichcoverabout8GBoftext.Webelievethat contexttheinputwordoccurstogetherwiththewordlist the size is very important here, and will add significant inthesecontexts. Otherwise,theoutputisconstrainedby robustnesstoourresults. theinputcontext. Thefunctionsaredescribedbelow: Toreduceostensiblydifferentcontexts,twoimportant PN WL takes the optional contextual skeleton and one abstractionstakeplaceatthisstage.(1)Syntacticlemma- specified word in that context as inputs and returns tization to get the lemma for both nouns and verbs in the context defined in Eq. 1. For data parsed via Lin’s [strategy-involve-*-*-*,<DO :advertisement minipar, the lexeme of each word is already included 4,abuse1,campaign2,compromise1,everything1, in the parser. (2) Sematic categorization to unify pro- fumigation1,item1,membership1,option3,stock- nouns, proper names of people, locations and organiza- option1>] tion as well as numbers. This semantic abstraction cap- In this case,“strategy” is the subject and “involve” tures the underlying semantic proximity by categorizing isthepredicateandallwordsinthelistserveasthe multitudinoussurface-formpropernamesintoonerepre- direct object. The number in the parentheses is the sentingsymbol. frequencyofthetoken. Withk = 3weactuallyget Whilethefirstabstractionissimplethesecondisnot. asawordlistonly: <advertisment,option>. At this point we use an NE tagger we developed our- 2. Thereareseveralwaystoprunewordlistsbasedon selvesbasedontheapproachtophraseidentificationde- thedifferentcontextswordsmayoccurin. Thisin- velopedin(PunyakanokandRoth,2001). Notethatthis volvesadefinitionofsimilarcontextsandthreshold- abstraction handles multiword phrases. While the accu- ingbasedonthenumberofsuchcontextsawordoc- racyoftheNEtaggerisaround90%,wehaveyettoex- cursin.Atthispoint,weimplementtheconstruction perimentwiththeimplicationofthisadditionalnoiseon ofPhraseNetusingaclusteringofcontexts,asdone PhraseNet. in (PantelandLin,2002). AnexhaustivePhraseNet At the end of this stage, each sentence in the original listisintersectedwithwordlistsgeneratedbasedon corpora is transformed into a single context either at clusteredcontextsgivenby (PantelandLin,2002). the lowest level or a more generalized instantiation (with name entity tagged). For example, “For six 3. Weprunefromthewordlistoutliersintermsofthe years, T. Marshall Hahn Jr. has made relational structure inherited from WordNet. Cur- corporate acquisitions in the George rently,thisisimplementedonlyusingthehypernym Bush mode: kind and gentle.”, changes to: relation. Thehypernymsharedbythehighestnum- [Peop−make−acquisition−in−mode]. berofwordsinthewordlistiskeptinthedatabase. The second stage of constructing PhraseNet concen- Forexample,bysearching“option”inWordNet,we trates on constructing the core element in PhraseNet: getitsthreesenses. Thenwecollectthehypernyms consets. of‘option’fromallthesensesasfollows: To do that, for each context, we collect wordlists that 05319492(a financial instrument whose value is containthosewordsthatwedeterminetobeadmissiblein basedonanothersecurity) thecontext(orcontextssharetheequalrelation). Thefirst 04869064(thecognitiveprocessofreachingadeci- step in constructing the wordlists in PhraseNet is to fol- sion) low the most strict definition – include those words that actuallyoccurinthesamecontextinthecorpus. Thisin- 00026065(somethingdone) volves all Equal consets with the transitive property to Wedothisforeverywordintheoriginallistandfind a specific syntactic component. We then apply to the outthehypernym(s)sharedbythehighestnumberof wordliststhreetypesofpruningoperationsthatarebased wordsintheoriginalwordlist. Thefinalpickinthis on(1)frequencyofwordoccurrencesinidenticalorsimi- caseisthesynset05319492whichissharedbyboth larcontexts;(2)categorizationofwordsinwordlistbased “option”and“stockoption”astheirhypernym. on clustering all contexts they occur in, and (3) pruning Thethirdstageistodeveloptheaccessfunctions. As viatherelationalstructureinheritedfromWordNet-we mentioned before, while we envision many ways users prunefromthewordlistoutliersintermsofthisrelational of PhraseNet can use the retrieved information, at this structure. Some of these operations are parameterized preliminary stage of PhraseNet we focus mostly on us- and determining the optimal setting is an experimental ing PhraseNet as a way to supply abstract semantic fea- issue. tures that learning based natural language applications canbenefitfrom. 1. Every word in a conset wordlist has a frequency Forthispurpose, sofarwehaveonlyusedandevalu- record associated with it, which records the fre- ated the function PN WL. PN WL takes as input as quency of the word in its exact context. We prune specific word and (optionally) its context and returns a words with a frequency below k (with the current listsofwordswhicharesemanticallyrelatedtothetarget corpus we choose k = 3). A disadvantage of wordinthegivencontext. Forexample, this pruning method is that it might filter out some appropriate words with a low frequency in reality. PN WL (V=protest,[peop-legislation-*-*-*])= For example, for the partial context [strategy − [protest,resist,dissent,veto,blackball,negative,for- involve−∗−∗−∗],wehave: bid,prohibit,interdict,proscribe,disallow]. Thisfunctioncanbeimplementedviaanyofthethree whichcanclassifyaquestionintooneof50fine-grained pruning methods discussed earlier (see Sec. 4). This classes. wordlists that this function outputs, can be used to aug- Thebaselineclassifiermakesuseofsyntacticfeatures ment feature based representations for other, learning like the standard POS information and information ex- based, NLP tasks. Other access functions of PhraseNet tracted by a shallow parser in addition to the words in can serve in other ways, e.g., expansions in information thesentence. Theclassifieristhenaugmentedwithstan- retrieval,butwehavenotexperimentedwithityet. dardWordNetorwithPhraseNetinformationasfollows. With the experiments we are doing right now, In all cases, words in the sentence are augmented with PhraseNetonlytakesinputswiththecontextinformation additionalwordsthataresupposedtobesemanticallyre- intheformatofEq.1. Semanticcategorizationandsyn- lated to them. The intuition, as described above, is that tacticlemmatizationofthecontextisrequiredinorderto this provides a level of abstract – we could have poten- get matched in the database. However, PhraseNet will, tially seen an equivalent question, where other “equiva- inthefuture,havefunctionsthatallowausertosupplya lent”wordsoccur. wordtokenandmoreflexiblecontexts. ForWordNet,foreachwordinaquestion,allitshyper- nymsareaddedtoitsfeaturebasedrepresentation(inad- 4 EvaluationandApplication ditiontothesyntacticfeatures). ForPhraseNet,foreach word in a question, all the words in the corresponding InthissectionweprovideafirstevaluationofPhraseNet. consetwordlistareadded(wherethecontextissupplied Wedothatinthecontextofalearningtask. bythequestion). LearningtasksinNLParetypicallymodelledasclas- Ourexperimentscomparethethreepruningoperations sification tasks, where one seeks a mapping g : X → describedabove. Trainingisdoneonadatasetof21,500 c ,...,c ,thatmapsaninstancex ∈ X (e.g.,asentence) questions. Performance is evaluated by the precision of 1 k tooneofc ,...,c –representingsomepropertiesofthe classifying1,000testquestions,definedasfollows: 1 k instance(e.g.,apart-of-speechtagofawordinthecon- #of correctpredictions text of the sentence). Typically, the raw representation Precison= (2) #of predictions – sentence or document – are first mapped to some fea- ture based representation, and then a learning algorithm Table2presentstheclassificationprecisionbeforeand isappliedtolearnamappingfromthisrepresentationto after incorporating WordNet and PhraseNet information thedesiredproperty(Roth,1998). Itisclearthatinmost into the classifier. By augmenting the question classi- casesrepresentingthemappinggintermsoftherawrep- fierwithPhraseNetinformation,eveninthispreliminary resentationoftheinputinstance–wordsandtheirorder stage, the error rate of the classifier can be reduced by – is very complex. Functionally simple representations 12%,whileanequivalentuseofWordNetinformationre- of this mapping can only be formed if we augment the ducestheerrorbyonly5.7%. informationthatisreadilyavailableintheinputinstance with additional, more abstract information. For exam- InformationUsed Precision ErrReduction Baseline 84.2% 0% ple, it is common to augment sentence representations WordNet 85.1% 5.7% with syntactic categories – part-of-speech (POS), under PN:Freq.basedPruning 84.4% 1.3% the assumption that the sought-after property, for which PN:Categ.basedPruning 85% 5.1% weseektheclassifier, dependsonthesyntacticroleofa PN:RelationbasedPruning 86.1% 12% wordinthesentenceratherthanthespecificword. Sim- ilarlogiccanbeappliedtosemanticcategories. Inmany Table 2: QuestionClassificationwithPhraseNetInforma- cases, the property seems not to depend on the specific tion Question classification precision and error rate reduction compared with the baseline error rate(15.8%) by incorporat- wordusedinthesentence–thatcouldbereplacedwith- ing WordNet and PhraseNet(PN) information. ‘Baseline’ is outaffectingthisproperty–butratheronits‘meaning’. the classifier that uses only syntactic features. The classifier InthissectionweshowthebenefitofusingPhraseNet istrainedover21,500questionsandtestedover1000TREC10 indoingthatinthecontextofQuestionClassification. and11questions. Questionclassification(QC)isthetaskofdetermining 5 RelatedWork the semantic class of the answer of a given question. For example, given the question: “What Cuban In this section we point to some of the related work dictator did Fidel Castro force out onsyntax,semanticsinteractionandlexicalsemanticre- of power in 1958?” we would like to determine sourcesincomputationallinguisticsandnaturallanguage that its answer should be a name of a person. Our processing. Many current syntactic theories make the approach to QC follows that of (Li and Roth, 2002). commonassumptionthatvariousaspectsofsyntactical- The question classifier used is a multi-class classifier ternation are predicable via the meaning of the predi- cate in the sentence (Fillmore, 1968; Jackendoff, 1990; alwaysworkassuccessfullyasresearchershaveexpected Levin, 1993). With the resurgence of lexical seman- (KrymolowskiandRoth, 1998; MontemagniandPirelli, tics and corpus linguistics during the past two decades, 1998). This seems to be due to lack of topical context this so-called linking regularity triggers a broad interest (Harabagiu et al., 1999; Agirre et al., 2001) as well as of using syntactic representations illustrated in corpora localcontext(Fellbaum,1998). Byaddingcontextualin- to classify lexical meaning (Baker et al., 1998; Levin, formation, many researchers, (e.g., (Green et al., 2001; 1993;DorrandJones,1996;LapataandBrew,1999;Lin, Lapata and Brew, 1999; Landes et al., 1998)), have al- 1998b;PantelandLin,2002). readymadesomeimprovementsoverit. FrameNet (Baker et al., 1998) produces a seman- Theworkontheimportanceofconnectingsyntaxand tic dictionary that documents combinatorial properties semanticsindevelopinglexicalsemanticresourcesshows of English lexical items in semantic and syntactic terms the importance of contextual information as a step to- basedonattestationsinaverylargecorpus.InFrameNet, wardsdeeperlevelofprocessing. Withhierarchicalsen- aframeisanintuitivestructurethatformalizesthelinks tential local contexts embedded and used to categorize between semantics and syntax in the results of lexical word classes automatically, we believe that PhraseNet analysis. (Fillmore et al., 2001) However, instead of de- providestherightdirectionforbuildingusefullexicalse- rived via attested sentences from corpora automatically, manticdatabase. each conceptual frame together with all its frame ele- mentshastobeconstructedviaslowandlabor-intensive 6 DiscussionandFurtherWork manualwork. FrameNetisnotconstructedautomatically based on observed syntactic alternations. Though deep We believe that progress in semantics and in develop- semantic analysis is built for each frame, lack of auto- ing lexical resources is a prerequisite to any signifi- maticderivationofthesemanticrolesfromlargecorpora3 cant progress in natural language understanding. This confinestheusageofthisnetworkdrastically. work makes a step in this direction by introducing a Levin’sclasses(Levin,1993)ofverbsarebasedonthe context-sensitive lexical semantic knowledge base sys- assumptionthatthesemanticsofaverbanditssyntactic tem, PhraseNet. We have argued that while cur- behavior are predictably related. She defines 191 verb rent lexical resources like WordNet are invaluable, we classes by grouping 4183 verbs which pattern together should move towards contextually sensitive resources. withrespecttotheirdiathesisalternations, namelyalter- PhraseNetisdesignedtofillthisgap,andourpreliminary nationsintheexpressionsofarguments. InLevin’sclas- experimentswithitarepromising. sification, it is the syntactic skeletons (such as np-v-np- PhraseNet is an ongoing project and is still in its pre- pp)toclassifyverbsdirectly. Levin’sclassificationisval- liminary stage. There are several key issues that we are idated via experiments done by (Dorr and Jones, 1996) currently exploring. First, given that PhraseNet draws andsomecounter-argumentsarein (BakerandRuppen- part of it power from corpora, we are planning to en- hofer, 2002). Her work provides a a small knowledge large the corpus used. We believe that the data size sourcethatneedsfurtherexpansion. is very important and will add significant robustness to Lin’swork (Lin,1998b;PantelandLin,2002)makes our current results. At the same time, since construct- use of distributional syntactic contextual information to ingPhraseNetreliesonmachinelearningtechniques,we define semantic proximity. Dekang Lin’s grouping of need to study extensively the effect of tuning these on similar words is a combination of the abstract syntactic the reliability of PhraseNet. Second, there are several skeletonandconcretewordtokens.Linusessyntacticde- functionalitiesandaccessfunctionsthatweareplanning pendenciessuchas“Subj-people”,“Modifier-red”,which to augment PhraseNet with. Among those is the ability combine both abstract syntactic notations and their con- of allowing a user to query PhraseNet even without ex- cretewordtokenrepresentations. Heappliesthismethod plicitlyspecifyingtheroleofwordsinthecontext. This to classifying not only verbs, but also nouns and adjec- wouldreducetherequirementforusersandapplications tives. While no evaluation has ever been done to deter- usingPhraseNet. Finally,currentPhraseNethasnolexi- mineifconcretewordtokensarenecessarywhenthesyn- calinformationaboutadjectivesandadverbs,whichmay tacticphrasetypesarealreadypresented, Lin’sworkin- contain important distributional information about their directly shows that the concrete lexical representation is modified nouns or verbs. We would like to take this in- effective. formationintoconsiderationinthenearfuture. WordNet (Fellbaum, 1998) by far is the most widely usedsemanticdatabase. However,thisdatabasedoesnot References 3Theattempttolabelthesesemanticrolesautomaticallyin (Gildea and Jurafsky, 2002) assumes knowledge of the frame E.Agirre,O.Ansa,D.Martinez,andE.Hovy. 2001. Enriching andcoversonly20%ofthem. wordnetconceptswithtopicsignatures. C. Baker and J. Ruppenhofer. 2002. Framenet’s frames vs. M.LapataandC.Brew. 1999. Usingsubcategorizationtore- levin’sverbclasses. InProceedingsofthe28thAnnualMeet- solveverbclassambiguity. InProceedingsofEMNLP,pages ingoftheBerkeleyLinguisticsSociety. 266–274. C. Baker, C. Fillmore, and J. Lowe. 1998. The Berkeley B. Levin. 1993. English Verb Classes and Alternations: FrameNetproject. InChristianBoitetandPeteWhitelock, A Preliminary Investigation. University of Chicago Press, editors, Proceedings of the Thirty-Sixth Annual Meeting of Chicago,IL. the Association for Computational Linguistics and Seven- teenthInternationalConferenceonComputationalLinguis- X. Li and D. Roth. 2002. Learning question classifiers. In tics,pages86–90,SanFrancisco,California.Associationfor ProceedingsofCOLING. ComputationalLinguistics,MorganKaufmannPublishers. D. Lin. 1998a. Dependency-based evaluation of minipar. In R.BarzilayandK.R.McKeown. 2001. Extractingparaphrases InWorkshopontheEvaluationofParsingSystemsGranada fromaparallelcorpus. InProceedingofthe10thConference Spain. oftheEuropeanChapterofACL. D.Lin. 1998b. Aninformation-theoreticdefinitionofsimilar- E.BrillandP.Resnik. 1994. Arule-basedapproachtoprepo- ity. InProc.15thInternationalConf.onMachineLearning, sitionalphraseattachmentdisambiguation. InProc.ofCOL- pages296–304.MorganKaufmann,SanFrancisco,CA. ING. S. Montemagni and V. Pirelli. 1998. Augmenting WordNet- P.F.Brown,S.A.D.Pietra,V.J.D.Pietra,andR.L.Mercer. like lexical resources with distributional evidence. an 1991. Wordsensedisambiguationusingstatisticalmethods. application-oriented perspective. In S. Harabagiu, editor, InProceedingsofACL-1991. Use of WordNet in Natural Language Processing Systems: ProceedingsoftheConference,pages87–93.Associationfor B.DorrandD.Jones. 1996. Roleofword-sensedisambigua- ComputationalLinguistics. tioninlexicalacquisition. V.NgandC.Cardie. 2002. Improvingmachinelearningap- C.Fellbaum. 1998. InC.Fellbaum,editor,WordNet: AnElec- proachestocoreferenceresolution. In Proceedingsof40th tronicLexicalDatabase.TheMITPress. AnnualMeetingoftheACL,TaiPei. C. J. Fillmore, C. Wooters, and C. F. Baker. 2001. Building P. Pantel and D. Lin. 2000. An unsupervised approach to alargelexicaldatabankwhichprovidesdeepsemantics. In prepositional phrase attachment using contextually similar Proceedings of the Pacific Asian Conference on Language, words. In Proceedings of Association for Computational InformationandComputation,HongKong. Linguistics,Hongkong. C.J.Fillmore. 1968. Thecaseforcase. InBachandHarms, P.PantelandD.Lin. 2002. Discoveringwordsensesfromtext. editors, Universals in Linguistic Theory, pages 1–88. Holt, In The Eighth ACM SIGKDD International Conference on Rinehart,andWinston,NewYork. KnowledgeDiscoveryandDataMining. W.A.Gale,K.W.Church,andD.Jarowsky. 1992. Amethod V. Punyakanok and D. Roth. 2001. The use of classifiers in fordisambiguationwordsensesinlargecorpora. Computers sequentialinference. InNIPS-13; The2000Conferenceon andtheHumanities,26(5-6):415–439. AdvancesinNeuralInformationProcessingSystems, pages 995–1001.MITPress. D. Gildea and D. Jurafsky. 2002. Automatic labeling of se- mantic roles. Computational Linguistics, 28(3):245–288, D.Roth. 1998. Learningtoresolvenaturallanguageambigu- September. ities: Aunifiedapproach. InProc.NationalConferenceon ArtificialIntelligence,pages806–813. R.Green,L.Pearl,B.J.Dorr,andP.Resnik. 2001. Lexicalre- sourceintegrationacrossthesyntax-semanticsinterface. In H. Saggion and G. Lapalme. 2002. Generating indicative- ProceedingsofWordNetandOtherLexicalResourcesWork- informative summaries with sumum. Computational Lin- shop,NAACL,Pittsburg,June. guistics,28(4):497–526. S. M. Harabagiu, G. A. Miller, and D. I. Moldovan. 1999. J. Stetina and M. Nagao. 1997. Corpus based pp attachment Wordnet2-amorphologicallyandsemanticallyenhancedre- ambiguityrosolutionwithasemanticdictionary. InProceed- sources. InProceedingsofACL-SIGLEX99: Standardizing ingsofthe5thWorkshoponVeryLargeCorpora,Beijingand LexicalResources,pages1–8,Maryland. Hongkong. R.Jackendoff. 1990. SemanticStructures. MITPress, Cam- E.Voorhees. 2002. OverviewoftheTREC-2002questionan- bridge,MA. swering track. In The Eleventh TREC Conference, pages 115–123. Y.KrymolowskiandD.Roth. 1998. Incorporatingknowledge in natural language learning: A case study. In COLING- ACL’98workshopontheUsageofWordNetinNaturalLan- guageProcessingSystems. S.Landes,C.Leacock,andR.I.Tengi. 1998. Buildingseman- ticconcordances. InC.Fellbaum,editor,WordNet:anElec- toronicLexicalDatabase,pages199–216.TheMITPress.