ebook img

Grammar engineering across frameworks, proceedings GEAF'07 PDF

331 Pages·2007·4.895 MB·English
by  
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Grammar engineering across frameworks, proceedings GEAF'07

Proceedings of the Grammar Engineering Across Frameworks (GEAF07) Workshop Tracy Holloway King and Emily M. Bender (Editors) CSLI Studies in Computational Linguistics ONLINE Ann Copestake (Series Editor) 2007 CSLI Publications http://csli-publications.stanford.edu/ Contents 1 Editor’sNote 4 2 Jason Baldridge, Sudipta Chatterjee, Alexis Palmer, and Ben Wing: DotCCG and VisCCG: Wiki and Programming Paradigms for Im- provedGrammarEngineeringwithOpenCCG 5 3 EmilyM.Bender: CombiningResearchandPedagogyintheDevelop- mentofaCrosslinguisticGrammarResource 26 4 Daniel G. Bobrow, Bob Cheslow, Cleo Condoravdi, Lauri Karttunen, TracyHollowayKing,RowanNairn,ValeriadePaiva,CharlottePrice, andAnnieZaenen: PARC’sBridgeandQuestionAnsweringSystem 46 5 AntónioBrancoandFranciscoCosta: AccommodatingLanguageVari- ationinDeepProcessing 67 6 Elizabeth Owen Bratt, Karl Schultz, and Stanley Peters: Challenges in Interpreting Spoken Miliary Commands and Tutoring Session Re- sponses 87 7 LucasChampollion,JoshuaTaubererandMaribelRomero: ThePenn Lambda Calculator: Pedagogical Software for Natural Language Se- mantics 106 8 NikosChatzichrisafis,DickCrouch,TracyHollowayKing,RowanNairn, Manny Rayner, and Marianne Santaholma: Regression Testing For Grammar-BasedSystems 128 9 Ji Fang and Tracy Holloway King: An LFG Chinese Grammar for MachineUse 144 10 LarsHellan: On‘DeepEvaluation’forIndividualComputationalGram- marsandforCross-FrameworkComparison 161 11 Tracy Holloway King and John T. Maxwell III: Overlay Mechanisms forMulti-levelDeepProcessingApplications 182 12 François Lareau and Leo Wanner: Towards a Generic Multilingual DependencyGrammarforTextGeneration 203 13 MontserratMarimon,NúriaBel,andNataliaSeghezzi: Test-suiteCon- structionforaSpanishGrammar 224 2 14 Yusuke Miyao, Kenji Sagae, Jun’ichi Tsujii: Towards Framework- IndependentEvaluationofDeepLinguisticParsers 238 15 StefanMüller: TheGrammixCD-ROMASoftwareCollectionforDe- velopingTypedFeatureStructureGrammars 259 16 PaulaS.Newman: GrammarsandProgrammingLanguages: ToFur- therNarrowtheGap 267 17 NickPendar: SoftConstraintsatInterfaces 285 18 YukikoSasakiAlam: AMorpho-SyntacticAnalyzerofControlledJapa- nese 306 19 Tam Wai Lok, Miyao Yusuke, and Tsujii Jun’ichi: Framework Inde- pendent Summarized Parser Output in XML and its Example-based Documentation 319 3 1 Editor’s Note The papers in this volume came out of the workshop on Grammar Engineering AcrossFrameworksheldatStanfordUniversityinconjunctionwiththeLSALin- guisticsInstituteinJuly2007. Theworkshopincludedapaneldiscussiononeval- uationmethodologiesandmetrics,aregularsession,andademosession. WewouldliketothanktheDepartmentofLinguisticsatStanfordandtheLSA Instituteforassistancebothfinancialandlogisticalinputtingontheworkshop. For logisticalsupport, we areparticularlygrateful toVivienne Fong, Anubha Kothari, David Hall and Daria Suk. Additional financial support came from Powerset and CSLI,whomwethankfortheirsponsorshipoftheworkshop. Our appreciation also to the program committee, who not only selected the papers to be presented but also provided valuable feedback to the authors: Jason Baldridge,SrinivasBangalore,JohnBateman,MiriamButt,AoifeCahill,Stephen Clark,BertholdCrysmann,SteffiDipper,DanFlickinger,RonKaplan,Montserrat Marimon,OwenRambow,andJesseTseng. Finally, we would like to acknowledge the workshop participants, many of whomtraveledtoStanfordfromfaraway. Thediscussionswerelivelyandproduc- tive,andwehopetoseemorevenuesforexchangewithinthegrammarengineering communityabouttopicsofmutualinterest. 4 DotCCG and VisCCG: Wiki and Programming Paradigms for Improved Grammar Engineering with OpenCCG Jason Baldridge†, Sudipta Chatterjee‡, Alexis Palmer†, and Ben Wing‡ †Dept. of Linguistics, ‡Dept. of Computer Science University of Texas at Austin Proceedings of the GEAF 2007 Workshop Tracy Holloway King and Emily M. Bender (Editors) CSLI Studies in Computational Linguistics ONLINE Ann Copestake (Series Editor) 2007 CSLI Publications http://csli-publications.stanford.edu/ 5 Abstract Wepresentasuiteoftoolsforsimplifyingthecreationandmaintenance ofgrammarsfortheOpenCCGparsingandrealizationsystem. Thecoreof our approach relies on a terse but expressive textual format, DotCCG, for declaringCCGgrammars. Itsupportspowerfulstringexpansionsthatallow grammardeveloperstoeliminateredundancyinthedeclarationofbothmor- phologyandcategorydefinitions. Grammarswritteninthisformatarecon- vertedintotheXMLutilizedbyOpenCCGusingtheccg2xmlutility,which –likeaprogramminglanguagecompiler–providesinformationregardinger- rorsinthegrammar,includingthetypeoferrorandthelinenumberonwhich itoccurs. DotCCGgrammarscanbeeditedwithVisCCG,agraphicalinter- facewhichprovidesvisualizationofvariouscomponentsofthegrammarand allows localeditingofinformationina mannerinspiredbywikis. We also report on resources developed to facilitate wide use of the OpenCCG tool suitepresentedinthispaperandonrecentusesofthetoolsinbothacademic researchandclassroomenvironments. 1 Introduction A major challenge of grammar engineering is enabling users with little computer experience to create complex grammars. Many users encounter significant obsta- clesandeasily getfrustrated bytrivial syntax errorsandnon-intuitive formats. At thesametime,moreexperienceduserscanfeelneedlesslyconstrainedbygrammar engineeringaidsdesignedfornoviceusers. Suchfrustrations slowusersdownand canresultinafocusonmechanicsmorethanonthegrammaritself. Thispaper presents twocontributions for improving current practice in gram- marengineering. First,itprovidesatersebutexpressiveformatfordeclaringCom- binatoryCategorial Grammars(CCG)(Steedman,2000; SteedmanandBaldridge, Toappear)thatutilizesideasfromsoftwareengineeringforreducingredundancyin CCGgrammars. Thebasicideaisgeneralenoughtobeusedwithotherformalisms. Second,itdescribesawiki-inspirededitinginterface,VisCCG,thatsupportsgram- marvisualization whileallowinguserstodirectlyeditplaintextgrammars. The core motivation for these developments is to improve the grammar de- velopment cycle forOpenCCG(openccg.sf.net)(Hockenmaier et al.,2004; BaldridgeandKruijff,2002;WhiteandBaldridge,2003),aparsingandrealization system that uses CCG, and to provide a model for facilitating grammar develop- mentforbothnoviceandexpertgrammarwriters. OpenCCGhaslonglackedsuch anenvironment despiteitsuseinanumber ofprojects. Grammarsdeveloped with VisCCG are compiled into OpenCCG’s native XML format, much in the same †WewouldliketothankEmilyBender,FredHoyt,Geert-JanKruijff,MarkSteedman,Michael White,studentsinJasonBaldridge’scategorialgrammar,computationalsyntax,andcomputational linguistics courses at UT Austin in2006/7, and the participants of the GEAF2007 workshop for valuablefeedback. ThisresearchwassupportedbyaLiberalArtsInstructionalTechnologyGrant fromtheUniversityofTexasatAustin. 6 manner as wiki pages produce HTML.The goal is to create a grammar engineer- ingenvironment forCCGthatisbotheasytolearntouseandeasytouse. Webeginbymotivating ourworkinthecontext ofOpenCCGaswellasother grammar engineering platforms. In section 4 we then briefly introduce CCG and OpenCCGandsomeoftheproblems withOpenCCG’snativeXMLgrammarfor- mat. Section 5 discusses DotCCG, followed by an extensive discussion of its pa- rameterized macro mechanisms in section 6. Then we present VisCCG and con- clude with a brief discussion of uses of our tools and resources for developing OpenCCGgrammars. 2 Motivation A graphical user interface (GUI) was developed for Grok, OpenCCG’s predeces- sor, but development was ceased as the parsing system itself was improved (see Bierner (2001) and Baldridge (2002) for specific reference to Grok). Developing grammarsforOpenCCGhassinceinvolved working withunwieldy XMLspecifi- cations. Ourworkwasinitiatedtoaddressthis(ratherlarge)gapinCCGgrammar development.1 Severalaspectsofourapproach arenovelandmaybeusefulinthe contextofworkinotherformalismsand/orgrammarengineering environments. Theschism between computational definitions andthe grammar they aresup- posed to express has been addressed in various ways, with visualization being a commonstrategyformoreintuitiverepresentations ofthegrammar. Oneapproach is to develop a GUI for editing objects such as trees and feature structures, such as that of the XTAG system (Doran et al., 2000). The XTAG system included a graphical tree-drawing editor which allowed the user toattach features and labels tonodesofatree. Insuchsystems, grammardevelopers usuallydonotworkwith the underlying code. A high-level approach like that of the XTAG tree editor is friendlyfornoviceusersbutcanbefrustratingly restrictiveforexperiencedusers. Analternativeistodevelopgrammarsbyworkingwithalow-levelformatand then visualizing them with a separate GUI which displays information. For ex- ample,theLKBsystem(Copestake,2002)providesextensive,highlyconfigurable displaysofvariouscomponents ofgrammarswrittenintheTypeDescriptionLan- guage. Thedisplay functionality inthe XLEsystem for grammar development in the Lexical-Functional Grammar framework (Butt et al., 1998) is similarly infor- mative and configurable. In such systems, however, the developer cannot directly editthegrammarusingtheGUI.Instead,theplaintextgrammariseditedandthen reloadedtoviewtheeffectofthemodificationsinthegraphical representation. An interesting compromise between visualization and low-level specification canbeobserved withtheuseofwikisfor creating webcontent. HTMLandXML are cumbersome and unintuitive formats; wiki notation as an alternative has en- 1Concurrentlywithourwork,ScottMartinandMichaelWhiteatOhioStateUniversitydeveloped acomplementarytoolcalledgrammardocwhichproduces asetofHTMLpages forvisualizing OpenCCGgrammars.BothgrammardocandourtoolsaredistributedwiththeOpenCCGsystem. 7 1 pay **close** attention wikisyntax 2 pay <b>close</b> attention HTMLsyntax 3 paycloseattention display Figure1: Wiki-stylenotationasshorthandforHTML abledlayuserstocreate webcontent quickly andeffectively. Forexample, inone commonwiki syntax, boldfaced text isindicated withdouble asterisks around the text. This shorthand (Figure 1, line 1) is then converted into HTML (line 2) and displayed asboldfaced text(line3). Wikisalsomakeiteasytoeditsmallportions of documents while visualizing the rest, and they provide immediate feedback on the visual outcome of edits. DotCCG provides a similar shorthand notation for OpenCCG’sXML,andVisCCGprovidesuser-friendly visualization andediting. Softwareengineeringprovidesanothersourceofideasforimprovinggrammar engineering. Most grammar specifications can be viewed as programming lan- guages particularized tonatural language, yet grammar platforms typically donot providemuchsupportforerrorcheckinganderrormessages. Ourccg2xmlutility compilesDotCCGtoOpenCCG’sXMLandsupportssuchcheckingintheprocess, whileVisCCGprovidesfeedbackinreal-time(duringediting). IntegratedDevelopmentEnvironments(IDEs)forprogramminglanguagescan be used to improve productivity for many developers. A key property of IDEs is thattheyareoptional–adevelopermayuseaplaintexteditortowriteprogramsif theywish. WeseeVisCCGinthislight. Itisparticularly useful for thosewhoare creatingtheirfirstgrammars. Intheclassroomsetting,weobservedthatuserswith lessexperience working withcomputers tendtostick withediting their grammars usingVisCCG,butmanyothers–particularlythosewithprogrammingexperience– switchovertotheirfavoritetexteditor(e.g. EmacsorVi)oncetheyunderstandthe DotCCGformat. ThelatterwouldstillperiodicallyloadtheirgrammarsinVisCCG. Weseethisavailability ofchoiceasahighlydesirable featureofthenewtoolswe havedevelopedforOpenCCG:theDotCCGformat,ccg2xml,andVisCCG. 3 Combinatory Categorial Grammar CCGisalexicalizedgrammarformalismthathasattractedbothlinguisticandcom- putational interest. It has a universal rule component that drives the combination ofcategories andtheir semanticstoprovide compositional analyses for sentences. Categoriesmaybeeitheratomicelementsor(curried) functions whichspecifythe canonical linear direction in which they seek their arguments. Some simplified examplelexicalentriesaregivenbelow: Olivia:=np the:=np/⋆n Finn:=np saw:=(s\np)/np plane:=n thinks:=(s\np)/⋄s 8 Themostbasicrulesareforward(>)andbackward(<)application. CCGalso utilizes rules based onthe composition (B), type-raising (T), and substitution (S) combinatorsofcombinatorylogic. TherulesofCCGare:2 (>) X/⋆Y Y⇒X (<) Y X\⋆Y⇒X (>B) X/⋄Y Y/⋄Z⇒X/⋄Z (<B) Y\⋄Z X\⋄Y⇒X\⋄Z (>B×) X/×Y Y\×Z⇒X\×Z (<B×) Y/×Z X\×Y⇒X/×Z (>T) X⇒Y/(Y\X) (<T) X⇒Y\(Y/X) Each rule is keyed to a modality; this allows lexical items to selectively utilize some rules but not others. For example, the/⋆ slash on the category for the keeps thecompositionrulesfromcausingungrammaticalwordorderpermutationswithin Englishnoun phrases. SeeBaldridge (2002) andBaldridge andKruijff (2003) for fullexplicationofthecomputational andlinguistic significanceofmodalities. Though the application rules do the majority of the work, the others are cru- cial for building the non-standard constituents for which categorial grammars are well-known. With these rules and the categories given above, we can provide an incremental derivationforasentencesuchas‘FinnthinksOliviasawtheplane’: Finn thinks Olivia saw the plane np (s\np)/⋄s np (s\np)/np np/⋆n n >T >T > s/(s\np) s/(s\np) np >B s/⋄s >B s/(s\np) >B s/np > s The constituent s/np derived above for ‘Finn thinks Olivia saw’ is also used in analysesforrelativeclauseslike‘theplanethat[FinnthinksOliviasaw]’andright- noderaisingsentenceslike‘[Kestrelheard]and[FinnthinksOliviasaw]theplane’. There has been a great deal of work in computational linguistics using CCG overthepast twodecades, andthere isanevengreater degreeof activityinrecent years. A major development was the creation of CCGbank (Hockenmaier and Steedman,2007),whichhasallowedthecreationoffastandaccurateprobabilistic CCG parsers for producing deep dependencies (Hockenmaier, 2003; Bos et al., 2004;ClarkandCurran,2007). CCGhasalsobeenusedtoinducesemanticparsers fromsentencespairedwithlogicalforms(ZettlemoyerandCollins,2007). Work withOpenCCG represents another major branch of CCGresearch. It is used for testing and developing syntactic and semantic analyses (Bierner, 2001; Baldridge, 2002; Kruijff and Baldridge, 2004; Gerstenberger and Wolksa, 2005) and for research into CCG parsing and realization (Hockenmaier et al., 2004; White and Baldridge, 2003; White, 2006b; White et al., 2007). It performs pars- ing/realization inthesystemsofanumber ofprojects, manyofwhicharegivenin Figure 2. Most of these are dialog systems, including natural language interfaces forrobots(CoSy,JAST,andINDIGO)andMP3systems(SAMMIE). 2Weexcludesubstitutionhereforspacereasons.Anexampleis>S:(X/⋄Y)/⋄Z Y/⋄Z⇒X/⋄Z. 9 Project References/Website AdaRTE Rojas-Barahona (2007) http://www.labmedinfo.org/research/adarte/adarte.htm COMIC FosterandWhite(2005,2007);NakatsuandWhite(2006); White(2006a) http://www.hcrc.ed.ac.uk/comic/ CoSy Kruijffetal.(2007) http://www.cognitivesystems.org CrAg Isardetal.(2006) http://www.hcrc.ed.ac.uk/crag/ DIALOG WolskaandKruijff-Korbayova´ (2004);Benzmu¨lleretal.(2007) http://www.ags.uni-sb.de/∼dialog/ FLIGHTS Mooreetal.(2004) INDIGO http://www.ics.forth.gr/indigo/ JAST Rickertetal.(2007) http://www.euprojects-jast.net/ Methodius Isard(2007) http://www.ltg.ed.ac.uk/methodius/ SAMMIE Beckeretal.(2006) http://www.talk-project.org Figure2: ExampleprojectsthatuseOpenCCGforparsingandrealization. 4 OpenCCG’s XML Format TheunderlyingnativespecificationformatofOpenCCGisXML.Grammaticalin- formationissplitacrosssixinterdependentfiles,someofwhichdefinecomponents thatweredirectlyinspiredbyXTAG(Doranetal.,2000). Eachfiledefinesamajor component of the grammar, including (a) astructured lexicon containing families oflexicalentries,(b)amorphological databasepairingwordswiththeirstemsand morphological features, (c) morphological macros instantiating feature values on lexical entries, (d) a hierarchy of typed features, (e) a set of parameterized CCG rules,and(f)atestbedofsentences usedforsimpleregressiontesting. AsanexampleofwhatisinvolvedincreatinglexicalentriesinOpenCCG,Fig- ure 3 shows a fragment of the XML lexicon, morphology, and typed-feature files foranOjibwe3grammar. Thisfragmentdefinesanounfamilythathasasinglelex- icalcategory,whichcontainsthreelexicalitems: gaago‘porcupine’,kwe‘woman’, andmzinig‘book’. Eachlexicaliteminflectswithfourforms: singular proximate, singularobviative,pluralproximate,andpluralobviative. Theinflectional suffixes vary according to the stem. Gaago and kwe are of animate gender, while mzinig is inanimate. A basic feature hierarchy is defined, consisting of person (2nd, 1st, 3rd,non3rd),number(singular,plural),gender(animate,inanimate),andobviation status (proximate, obviative). Note that the majority of the XML for defining the featurehierarchyhasbeentruncatedforspacereasons. Developing grammars directly in XML is time-consuming and error prone. XMLwasdesignedasaformattostandardizecommunicationofdataamongcom- puters, not for direct editing by humans. Furthermore, OpenCCG’s XML for- mat contains many redundancies and interdependencies, leading to errors when a change is made in one place and not propagated elsewhere. For example, the association between thepart of speech Nandthe three lexical items isdeclared in thelexicon fileandinmultiple places throughout themorphology file. Thedecla- rations of multiple inflected forms of thesame stem arealso highly repetitive and failtoexpressanygeneralizations overtheforms. Finally, thefeaturesattached to 3OjibweisanAlgonquianlanguageoftheupperGreatLakesregionandsoutheasternOntario. 10

See more

The list of books you might like