ComputersandtheHumanities 32: 1–38,1998. 1 ©1998KluwerAcademicPublishers. PrintedintheNetherlands. Tagging and the Case of Pronouns JANNEBONDIJOHANNESSEN TheTextLaboratory,DepartmentofLinguistics,UniversityofOslo,N-0317Oslo,Norway (e-mail:[email protected]) Key words: Constraint Grammar, corpus investigation, nominative pronouns in English and Norwegian,statisticinfrequency,Subjectposition Abstract. Usingacorpustoinvestigateempiricallygrammaticalphenomenapriortowritinggram- matical rules or constraints for a disambiguating tagger is important. The paper shows how even casedistinctionsonpronounsareusedmorediverselythanisusuallyassumed.BothinEnglishand NorwegiannominativepronounsareusedinmorepositionsthantheexpectedSubjectone.Although the other uses are statistically less frequent, they may be important to the users of the resulting taggedcorpus–whoareoftentheoreticallinguists.Ataggershouldthereforetagcorrectlyalsothe moreinfrequentconstructions.ThepapershowshowthiscanbedoneinaConstraintGrammartype tagger. 1. Introduction1 Tagging is often restricted to word level information such as part of speech and morphosyntacticcategorieslikenumber,gender,caseorothertypicallyinflectional categories. Sometaggersalsogivesyntacticfunctiontags,whichisveryusefulfor the user of the tagged text as well as for the tagger itself – the latter being able tousethisinformation asanaidinthedisambiguation process. Ifawordisunam- biguousatamorphologicalaswellasasyntacticlevel,itwillbeofgreathelptothe tagger. Pronounsareoftenassumedtobemorphologicallyandsyntacticallyunambigu- ous, in that they have case distinctions that correspond to syntactic distinctions. In this paper, I shall show that pronouns can be much more ambiguous than is often assumed. However, Ishall demonstrate that corpus investigations revealthat different pronouns occur indifferent syntactic positions aswellasindifferent text types. Rather than treating all pronouns alike, wecan, when designing the tagger, make use of the fact that different pronouns differ considerably with respect to their syntactic context. Many of the problems discussed here have been discov- ered during our work, attheUniversity ofOslo, withdesigning aconstraint based disambiguating taggerforNorwegian.2 The paper is structured in the following way. First some ambiguity problems in Norwegian are discussed, and the pronominal system is given. The question of 2 JANNEBONDIJOHANNESSEN marking the syntactic function of pronouns directly in the lexicon is introduced. Aninvestigation ofNorwegiannominative pronouns usedinnon-Subject contexts is presented in Section 3, and one of English nominative pronouns in Section 4. Section 5 gives a suggestive way of tagging the various non-Subject nominatives forNorwegian. 2. ProblemswithHomonymy 2.1. HOMONYMY IN NORWEGIAN Whenever a word form represents several lexemes or grammatical features, a disambiguating tagger should be able to choose between the various alternatives andpickone.Thismeansthatthetaggerideallygivesonetagonlyperword: (1) Anidealtagger result: Mannenlagetbobler “Themanmadebubbles” Mannen <“MAN”,N,Sg,def,common,SUBJ> laget <“MAKE”,V,pret,active> bobler <“BUBBLE”,N,Pl,indef,common,OBJ> In Norwegian, there is widespread homonymy between certain inflectional forms of different Norwegian lexemes, and disambiguation can turn out to be very difficult. Some suffixes can be used for a variety of purposes. Consider for examplethosebelow. (2) Twoexamplesofsuffixes thatcausewidespreadhomonymy -er: V,presenttense (murer‘plasters’) N,Pluralindefinite (murer‘walls’) N,agentive (murer‘mason’) -a: V,preterite (hoppa‘jumped’) V,pastparticiple (hoppa‘jumped’) N,singulardefinite (hoppa‘themare’) N,pluraldefinite (hoppa‘thejumps’) Therefore, manysentences aremultiplyambiguous: TAGGINGANDTHECASEOFPRONOUNS 3 (3) Asentence withmultiplyambiguous wordforms: Fiskerblåserbobler Fisker (“FISHERMAN”,N,Sg,indef,common) (“FISH”,N,Pl,indef,common) (“FISH”,V,pres,active) blåser (“BLOWER”,N,Sg,indef, common) (“WHISTLE”,N,Pl,indef, common) (“BLOW”,V,pres,active) bobler (“BUBBLE”,N,Pl,indef, common) (“BUBBLE”,V,pres,active) Since Norwegian is not a free word order language, it is, fortunately, not the case that just any combination of the three words above with just any of the tags willactuallybeanacceptablesentence.NorwegianisaV2language,fromwhichit followsthatinamainclause,theverbhastobeinthesecondposition(declarative clauses)orinthefirstposition(questions). Ofallthepossibilitiesabove,theoption of being a verb is therefore not possible for the third word - it must be a noun. However,whichofthewordsisSubjectandwhichisObject?3 Thedisambiguating English Constraint Grammar (ENGCG)tagger made in Helsinki for English (see, e.g.,Karlsson etal.,1995), includes anumber ofsyntactic function labelsinaddi- tion to the morphological tags. It is desirable for the Norwegian tagger, too, to includesyntactic information. In English, which is not a V2 language, it would have been easy to give the answer: If the verb is in the second position in the clause, the noun (phrase) that precedes it must be a Subject. A noun phrase in the first position of the clause canonly beanalysed asafronted Objectifthere isanoun phrase following itand thereisaverbphraseinthethirdposition: (4) EnglishclauseswithinitialSubject andinitialfrontedObject, respectively: a. Fishesblowbubbles b. Bubbles, fishesblow Since Norwegian is a V2 language, however, there is no formal difference betweenaclausewhichhasaSubjectinitially, andonewhichhasafrontedObject initially. Inbothcases,thereisanounphraseoneithersideoftheverb. 4 JANNEBONDIJOHANNESSEN (5) Norwegianclauses withinitialSubjectandinitialfronted Object, respectively: a. Fiskerblåserbobler b. Boblerblåserfisker Thedevelopers ofthe ENGCGtagger have constructed amainrule saying that theSubjectcanbefoundjustleftofthemainverb: (6) ThewordontheimmediateleftofthefiniteverbistheSubject. (AdaptedfromAnttila,1995,p.321) A similar rule is of course useless for Norwegian. We have to look for a dif- ferent way of determining the syntactic functions of the clausal components. One possibility isto start bytagging single wordsthat are unambiguous withregard to both form and syntactic function. They can be helpful as cornerstones for the rest ofthetagging. The Finnish ENGCG tagger for the most part uses syntactic information to determine thesyntactic category ofwordsandphrases, butsomewordshave been assigned asyntactic category already atalexical level, i.e.,asthewords arebeing looked up. Obviously, the earlier the tagger is able to give an unambiguous tag to aword,thebetteritisforthedisambiguation oftherestofthewordsintheclause, since the former ones can help ruling out a few cases. Some Norwegian function words are unambiguous and could be syntactically marked already at a lexical level: (7) Somewordsthatcanbegivenasyntactic tagevenatalexical level: at –<THAT,Sub-Conj>,introducesasubordinateclause hvis –<IF,Sub-Conj>,introduces aconditional clause 2.2. CASE DISTINCTIONS IN PRONOUNS It would be useful if there were a way we could tag individual words in the lexicon in order to help us find Subjects (and thus Objects). If Norwegian were a language that inflected its nouns for case the task would have been simple. But, like English, nouns are invariant from a case perspective. However, also like English, there are distinct nominative and accusative forms of some of the personal pronouns.4 Belowisthefullparadigm ofpronouns inthebokmålversion ofwrittenNorwegian: TAGGINGANDTHECASEOFPRONOUNS 5 (8) Norwegianpersonal pronouns –casedistinctions Person/Number Nominative Accusative 1Sg jeg meg 2Sg du deg 3SgM han han/ham 3SgF hun/ho henne 3SgN det det 3SgM+F,non-human den den 1Pl vi oss 2Pl dere dere 3Pl de dem There are case distinctions in the five pairs of 1st singular, 2nd singular, 3rd singular feminine, 1st plural, and 3rd plural pronouns. There is some individual variation as to whether there is a case distinction in the 3Sg masculine pronoun pair. There are no case distinctions in the 3Sg neuter, 3Sg non-human, and 2Pl pronounpairs.Butcouldweuseatleastthefivepronominalpairsthatdohavecase distinctions, and mark the nominative forms as Subjects and the accusative forms asObjects?Thisquestionisworthinvestigating, butfirstweshouldnoticethatthe 3rd plural nominative pronoun has the same form as the very common definite plural determiner. The homonymy of the pronoun and determiner de ‘they’, ‘the’ is illustrated below. Fortunately, the ambiguity between these two words is easily resolved, sincetheyobviously occurinquitedifferentsyntactic contexts. (9) Ambiguousde:3Plnominativepronoun or3Pldefinitedeterminer: a. De står der borte they stand there away “Theyarestanding overthere.” b. De store barna står der borte the big children stand there away “Thebigchildrenstandoverthere.” 2.3. LEXICAL TAGGING OF SYNTACTIC FUNCTIONS? Ifwecouldtagpronounswithsyntacticfunctiontagsinthelexicon,alotwouldbe gained. Although, as aclass of lexical items, pronouns are few,they are ofcourse extremelyfrequentinmosttexts.InconstructingtheENGCGtaggerforEnglish,it hasbeenassumedthatallnominativepronounsaresyntacticSubjectsoftheclauses inwhichtheyoccur.SeethefollowingcitationsfromworkdiscussingtheENGCG tagger: 6 JANNEBONDIJOHANNESSEN (10) “Forexample,weknowthattheformshewillalwaysbeSubject,hence thetag@SUBJmaybeincluded initslexicalentry.” (Voutilainen, HeikkiläandAnttila,1992,p.15) “Wordswithlexicallypredictablefunctionsinclude[...]and,crucially, pronouns suchasheandthey,whichareSubjectbythelexicon.” (Anttila, 1995,p.334) Theoutputfromthemorphological analyserusedbytheENGCGtaggershows that all the English pronouns that have nominative-accusative case distinctions havebeengivensyntactic function labelslexically: (11) OutputfromENGTWOL5/3-97, http://www.lingsoft.fi/cgi-pub/engtwol: “< (cid:3)i >” “i”<(cid:3) ><NonMod>PRONPERSNOMSG1SUBJ@SUBJ “i”<(cid:3) >ABBRNOMSG “<he>” “he”<NonMod>PRONPERSMASCNOMSG3SUBJ@SUBJ “<she>” “she”<NonMod>PRONPERSFEMNOMSG3SUBJ@SUBJ “<we>” “we”<NonMod>PRONPERSNOMPL1SUBJ@SUBJ “<they>” “they”<NonMod>PRONPERSNOMPL3SUBJ@SUBJ Although, undoubtedly, thismethodgivestherightresultsinmanycases, there are some grammatical constructions in which the lexical solution gives wrong results(theunderlining belowisdonebyJBJ): (12) OutputfromanENGCGanalysis 5/3-97, http://www.lingsoft.fi/ cgi-pub/engcg: a. “<(cid:3)he>” “he”<(cid:3)><NonMod>PRONPERSMASCNOMSG3SUBJ @SUBJ “<saw>” “see”<as/SVOC/A><SVO><SV><InfComp>VPASTVFIN @+FMAINV “<(cid:3)john>” “john” <(cid:3)><Proper>NNOMSG@OBJ TAGGINGANDTHECASEOFPRONOUNS 7 “<and>” “and”CC@CC “<(cid:3)i>” “i”<(cid:3) ><NonMod>PRONPERSNOMSG1SUBJ@SUBJ “<last>” “last”<Genord>DETPOSTSG/PL@QN> “<night>” “night” NNOMSG@ADVL “<$.>” b.5 “<(cid:3)she>” “she”<(cid:3) ><NonMod>PRONPERSFEMNOMSG3SUBJ @SUBJ “<is>” “be”<SV><SVC/N><SVC/A>VPRESSG3VFIN @+FMAINV “<bigger>” “big”AcmP@PCOMPL-S “<than>” “than”PREP@<NOM “<he>” “he”<NonMod>PRONPERSMASCNOMSG3SUBJ@SUBJ “<$.>” 3. AnInvestigation ofNominativeNon-subjectPronounsinActualTexts Itistempting toassumethatthereisadirect correspondence betweenmorpholog- ical case and syntactic function. This is often assumed in the linguistics literature generally.6 However,notinfrequently, thingsaredifferentinreality–andaninves- tigation ofactual corpora mayberevealing. Beforeattempting thelexicalsolution for Norwegian nominative pronouns, I conducted a study of two different texts (novels) toexamine the useofsuch pronouns. Thenovels were Gunhild byRagn- hild Magerøy (100 000 words) and Amatøren (40 000 words) by Lars Saabye Christensen, of which the first one is written by a female writer and set in rural Norway, while the other is written by a male writer and set in Oslo.7 Confirm- ing a slight suspicion, supported by some remarks in Faarlund, Lie and Vannebo (1997)(tobecommentedonlaterinthissection),butstillsomewhatsurprisingly,it turnedoutthatnominativepronouns areindeedusedinotherconstructions thanas Subjectsonly. Belowisalistofconstructions inwhichnominative pronouns were 8 JANNEBONDIJOHANNESSEN foundassomething otherthanSubject. (Abbreviations used:Gu=Gunhild, Am= Amatøren) A.Syntactic functions otherthanSubject: (13) RightDislocation a. Gunhilderdajentehoog[shetoo],[...] (Gu.) “ButGunhildtooisagirl.” b. Du må kjøpe deg en flaske du også [you too], hørte jeg ham (Am.) si. “Youhavetobuyyourself abottle,youtoo,Iheardhimsay.” (14) LeftDislocation a. Jeogdu[Iandyou],vibryrossitteomslarvet. (Gu.) “Youandme,wedon’tcareaboutthegossip.” (15) MiddleDislocation8 Ogdeterho,bareho[justshe]somfårsePålslik. (Gu.) “Anditissheonlyshe,whogetstoseePållikethis.” (16) Cleft a. Det var da ikke ho som [it wasn’t she who] hadde skubbet til (Gu.) ham,[...] “Butitwasn’tshewhohadpushedhim...” b. Detvardusom[itwasyouwho]ristetisengen,"saElseogså (Am.) påmeg. “Itwasyouwhoshookinbed,saidElseandlookedatme.” (17) Sentence Fragment a. E tykt e vart aldeles ålein med det. Berre e ålein. [just me (Gu.) alone] “IthoughtIwascompletelyalonewithit.Justmealone.” b. Såvardetatbenamineikkevillemer. Ellerjeg[orI]. (Am.) “Thenithappened thatmylegscouldn’t takeanymore.OrI.” (18) Predicate a. Erdetdu[isthatyou],Gunhild? (Gu.) “Isthatyou,Gunhild?” TAGGINGANDTHECASEOFPRONOUNS 9 b. Iallefall vardethan[...] (Am.) “Atleastitwashim...” (19) SimpleObjectandSubject-raised-to-Object a. Herinnpåkjøkkena haresettho[she]. (Gu.) “Hereinthekitchen Ihaveseenher.” b. Medendelbesværfikkjegstyrthan[he]utpåkjøkkenet. (Am.) “WithsomeeffortIwasabletoguidehimintothekitchen.” (20) ComparativeObject DensomharrøyntsåpassilivetsomMarjaSimensdatter, (Gu.) oghattsåmangekarersomho[asshe],skulleletteremerke omnoeerpåferdemellomkarogkvinnfolk “Theonewhohasasmuchexperience inlifeasM.S.,andhadasmany menasher,shouldeasierfeelifsomethingisgoingonbetweenman andwoman.” (21) Prepositional Object [E]insetderogmeinedetsåinderlegnår’ntalavel omho[about (Gu.) she] “Onesitsthereandmeansitsodeeplywhenonetalksnicely abouther.” [Hun]varsammenmedhan[he]enstundrundtjuletider. (Am.) “Shewasgoingoutwithhimforawhilearound Christmas.” (22) Exclamation Ådu,ådu[ohyouohyou],altdetudyrJonharsett. (Gu.) “Ohmyword,allthehorrorthatJonhasseen.” B.Pronounshomonymouswithotherpartsofspeech: (23) Determiner(Non-pronominal use) a. Hm-nei,mendetdimmestførhoMarit[sheMarit]no,stakkar. (Gu.) “Mmno,butitlooksdarkerforMarit,now,poorone.” 10 JANNEBONDIJOHANNESSEN b. Detvarhanfyren[hetheguy]medcowboystøvlene igjen. (Am.) “Itwasthatguywiththecowboybootsagain.” (24) Noun(Non-pronominal use) Vierbareskyggeravvårttidligerekeg [I] (ThenewspaperAftenposten) “Wearejustshadowsofourearlierself.” C.Pronounsmodifiedbyotherphrases: (25) WithRelativeclause a. Etalaomhosom’nhagåanherpåbeit[shewhohehasgoing (Gu.) hereongrass] “Italkedabouttheonehekeepshere.” b. Harduhørtomhansomikkeville[hewhodidn’twantto]tro (Am.) påGud[...] “Haveyourheardabouthewhodidn’t wanttobelieveinGod ...” (26) WithPP a. ...menmensmoragirenblankenihvaandrevil,nøyerikke (Gu.) hopåloftet[sheonthe.loft]segmeddet. “...butwhilethemothercouldn’t carewhatothersthink ofone,theoneontheloftdoesnotstopitatthat.” b. Hunmedlommetørklet[shewiththehandkerchief] ermoren (Am.) min og han med fotografiapparatet [he with the camera] som gårrundtpågulvet, erfarenmin,”svartejeg. “The lady with the handkerchief is my mother and the man with the camera who is walking around on the floor is my father, Ireplied.” (27) WithNP a. Jegbegynner medatvialle[weall]erpåpakketur tiljorden (Am.) “I’llstartwith[theidea]thatweareallonachartertour tothe Earth.” b. -Vimennesker[wepeople]”," ropteJakobhøyt. (Am.) “Wehumanbeings,Jakobshouted loudly.
Description: