ebook img

DTIC ADA531042: Question Generation via Overgenerating Transformations and Ranking PDF

0.63 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview DTIC ADA531042: Question Generation via Overgenerating Transformations and Ranking

Question Generation via Overgenerating Transformations and Ranking Michael Heilman and Noah A. Smith CMU-LTI-09-013 Language Technologies Institute School of Computer Science Carnegie Mellon University 5000 Forbes Ave., Pittsburgh, PA 15213 www.lti.cs.cmu.edu (cid:13)c2009, Michael Heilman and Noah A. Smith Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE 3. DATES COVERED 2009 2. REPORT TYPE 00-00-2009 to 00-00-2009 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Question Generation via Overgenerating Transformations and Ranking 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION Carnegie Mellon University,School of Computer Science,5000 Forbes REPORT NUMBER Ave,Pittsburgh,PA,15213 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S) 11. SPONSOR/MONITOR’S REPORT NUMBER(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution unlimited 13. SUPPLEMENTARY NOTES 14. ABSTRACT We describe an extensible approach to generating questions for the purpose of reading comprehension assessment and practice. Our framework for question generation composes general-purpose rules to transform declarative sentences into questions is modular in that existing NLP tools can be leveraged, and includes a statistical component for scoring questions based on features of the input, output, and transformations performed. In an evaluation in which humans rated questions according to several criteria, we found that our implementation achieves 43.3% precisionat- 10 and generates approximately 6.8 acceptable questions per 250 words of source text. 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18. NUMBER 19a. NAME OF ABSTRACT OF PAGES RESPONSIBLE PERSON a. REPORT b. ABSTRACT c. THIS PAGE Same as 15 unclassified unclassified unclassified Report (SAR) Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 Abstract We follow earlier attempts to apply computa- tional linguistics methods to QG (Mitkov et al., We describe an extensible approach to 2006; Kunichika et al., 2004; Gates, 2008). Our generating questions for the purpose of key contribution is the description and evaluation reading comprehension assessment and ofanimplementedframeworkthat(1)isdesigned practice. Ourframeworkforquestiongen- to be domain-general, (2) uses extensible compo- erationcomposesgeneral-purposerulesto sitions of rules to transform declarative sentences transformdeclarativesentences intoques- into questions, (3) explicitly handles relevant lin- tions,ismodularinthatexistingNLPtools guistic phenomena such as island constraints, (4) can be leveraged, and includes a statisti- is modular, permitting existing NLP tools to be calcomponentforscoringquestionsbased leveraged, and (5) includes a learned component onfeaturesoftheinput,output,andtrans- forrankinggeneratedquestionsaccordingtovari- formations performed. In an evaluation ouscriteria. in which humans rated questions accord- In§2wedescribeourframeworkforQG.§3de- ing to several criteria, we found that our tails our specific implementation. In §4 we de- implementationachieves43.3%precision- scribe a study in which humans rated automat- at-10 and generates approximately 6.8 ically generated questions from Wikipedia and acceptable questions per 250 words of news articles. The results show that our imple- sourcetext. mentationachieves43.3%acceptabilityforthetop 1 Introduction 10 ranked questions (i.e., 43.3% precision-at-10). We then address related work (§5) and conclude The generation of interrogative sentences by hu- (§7). mans has long been a major topic in linguistics, motivating various theoretical work (e.g., Ross, 2 Framework 1967),inparticularthosethatviewaquestionasa transformationofacanonicaldeclarativesentence Wedefineaframeworkforgeneratingarankedset (Chomsky, 1973). In computational linguistics, of fact-based questions about the text of a given questions have also been a major topic of study, article. From this set, the top-ranked questions butprimarilywiththegoalofansweringquestions might be given to an educator for filtering and re- (Dangetal.,2008). vision,orperhapsdirectlytoastudentforpractice. Automatic question generation (QG) is also a Thegenerationofasinglequestioncanbeuse- worthwhile enterprise, with applications in dia- fully decomposed into the three modular stages logue systems (Walker et al., 2001) and educa- depictedinFigure1. tionaltechnologies(Graesseretal.,2005),forex- Many useful questions can be viewed as lexi- ample. With respect to education, QG mecha- cal, syntactic, or semantic transformations of the nisms might improve and diversify the questions declarative sentences in a text. In stage 1, a se- posed to students by tutoring systems, leading to lectedsentenceorasetofsentencesfromthetext more natural and effective student-tutor interac- is transformed into one declarative sentence by tions. QGmightalsobeusedwitharbitrarysource optionally altering or transforming lexical items, textsforinstructioninlessinteractivesettings. syntactic structure, and semantics. Many existing Inthispaper,wefocusonQGinserviceofprac- NLP transformations might be exploited in this tice and assessment materials for literacy instruc- stage, including extractive summarization, sen- tion. Specifically, we focus on educational read- tencecompression,sentencesplitting,sentencefu- ingtoacquirecontent knowledge(ratherthanim- sion, paraphrase, textual entailment, lexical se- proving reading ability). Our aim is to generate manticsforwordsubstitution. questions that assess the content knowledge that Instage2,thedeclarativesentenceisturnedinto a student has acquired upon reading a text. We a question by executing a set of well-defined syn- restrict our investigation to fact-based questions tactic transformations (WH-movement, subject- aboutliteralinformationpresentinthetext,butwe auxiliary inversion, etc.), many of which are pos- believeourtechniquescanbeextendedtogenerate sible. Wecallthismodulea“questiontransducer.” questionsinvolvinginferenceanddeeperlevelsof Since different sentences from the input text, meaning. as well as different transformations of those sen- Figure1:Three-stageframeworkforautomaticquestiongeneration. tences, may be more or less likely to lead to and Andrew, 2006). These languages were also high-quality questions, in stage 3 the questions employed for QG by Gates (2008), though with a arescoredandrankedaccordingtofeaturesofthe differentapproach(cf. §3.4). source sentences, input sentences, the question, The Tregex language includes various rela- andthetransformationsusedin generation. Since tionaloperatorsbasedontheprimitiverelationsof many options are available at each stage, this is immediatedominance(denoted“<”)andimmedi- a “overgenerate-and-rank” strategy (Walker et al., ate precedence (denoted “.”). In addition to sup- 2001;LangkildeandKnight,1998). porting queries involving standard tree relations, Tregex also includes some linguistically moti- 3 Implementation vated constraints such as headship, constrained Wenextpresentthedetailsofourframework,im- dominance, and precedence (e.g., dominance plementing several operations at each of the three throughachainofverbphrases). Tsurgeonadds stages. We use techniques from summarization the ability to modify trees by relabeling, deleting, and sentence compression in stage 1, declarative moving,andinsertingnodes. question-generating rules motivated by research Queries in Tregex find nodes that match pat- in theoretical syntax in stage 2, and a statistical terns;thesenodesmaybenamedandmanipulated rerankerinstage3. using Tsurgeon operations. For example, the Tregex expression “SBAR < /ˆWH.*P$/ << 3.1 ConventionsandDefinitions NP|ADJP|VP|ADVP|PP=unmv” would find a The term “source sentence” refers to a sentence node labeled as one of a set of phrase types takendirectlyfromtheinputdocument,tobeused (“NP|ADJP|VP|ADVP|PP”) which is domi- to generate a question (e.g., Kenya is located in natedbyaclause(“SBAR”)thatalsodirectlydom- Africa.). In contrast, the term “derived sentence” inates a question phrase (“/ˆWH.*P\$/”). This referstoadeclarativesentencederivedinstage1. node can be retrieved through its user-assigned “Inputsentence”referstoasourceorderivedsen- name, “unmv” in this case. This rule is used tence to be given as input to the question trans- to identify the constraint on movement in clauses ducer. Theterm“answerphrase”referstophrases suchaswhetherJohnlikesMary. in declarative sentences which may serve as tar- Therulesinoursystemaremostlyimplemented gets for WH-movement, and therefore as possible using Tregex and Tsurgeon. In some cases, answers to generated questions (e.g., in Africa). however, we resort to transformations outside the The term “question phrase” refers to the phrase expressive power of Tsurgeon, which cannot containingthequestionwordreplacingananswer perform operations one at a time (all matching phrase(e.g.,WhereinWhereisKenyalocated?). nodesmustbetransformedsimultaneously),back- We use simplified Penn Treebank-style phrase reference relabeled nodes, or include reserved structuretrees,includingPOSandcategorylabels, words(e.g.,“insert”)innewnodelabels. as produced by the Stanford Parser (Klein and 3.3 TransformingSourceSentences(Stage1) Manning, 2003) to represent the syntactic struc- tureofsentences. Nounphraseheadsareselected We explored several transformations for stage 1, usingCollins’rules(Collins,1999). whichtakesasinputtheoriginaltextandproduces declarative derived sentences to be further modi- 3.2 SearchingandManipulatingTrees fiedbythequestiontransducer(§3.4). In order to implement the rules for transforming An analysis of the question transducer’s output sourcesentencesintoquestions, weuseTregex, when applied directly to sentences from source atreequerylanguage,andTsurgeon,atreema- texts revealed that complex sentences often lead nipulationlanguagebuiltontopofTregex(Levy to unnatural or even senseless questions. There- fore, various simplifying transformations are per- The system could be extended to detect and formed in stage 1 to remove phrase types such transform other types of phrases to produce other asleadingconjunctions,sentence-levelmodifying types of questions (e.g., how, why, and what kind phrases, and appositives. Such transformations of). It should be noted that the transformation have been utilized in previous work on headline from answer to question is achieved through a generation(DorrandZajic,2003)andsummariza- compositionofgeneral-purposerules. Thiswould tion (Toutanova et al., 2007). Table 1 describes allow,forexample,theadditionofarelativelysim- thesetransformations. ple rule to generate why questions by building off Further, the question transducer will not gener- of the existing rules for subject-auxiliary inver- ate questions about certain kinds of syntactically sion, verb decomposition, etc. In contrast, previ- embedded content (cf. Table 3). Questions can ous QG approaches have employed separate rules be produced about much of this embedded con- for specific sentence types (e.g., Mitkov and Ha, tent if we extract declarative sentences from fi- 2003;Gates,2008). niteclauses,relativeclauses,appositives,particip- For each sentence, many questions may be ial phrases. The sentence transformation module produced: there are often multiple possible an- implements these options. For example, it trans- swer phrases in a particular sentence, and multi- forms the sentence Selling snowballed because of ple question phrases for each answer phrase. For waves of automatic stop-loss orders, which are example,fromthesentenceFranciumwasdiscov- triggered by computer when prices fall to cer- ered by Marguerite Perey in France in 1939,2 the tainlevelsintoAutomaticstop-lossordersaretrig- transducerproducesthefollowingquestions: geredbycomputerwhenpricesfalltocertainlev- • Where was francium discovered by Marguerite els, from which stage 2 produces What are trig- Pereyin1939? geredbycomputerwhenpricesfalltocertainlev- els?. Table 2 describes the expressions and rules • When was francium discovered by Marguerite forextraction. PereyinFrance? We allowed for some of the operations in this • Was francium discovered by Marguerite Perey first stage to be fairly complex because they are inFrancein1939? less central to the whole system (for many sen- • By what was francium discovered in France in tences, these stage 1 rules do not fire at all). In 1939? contrast, the rules in stage 2, most of which are utilized for every generated question, maintain a Of course, the last question ought to be By higherdegreeofsimplicityandelegance. whomwas.... Thiserrorisduetothefailureofthe entityrecognitioncomponent,discussedin§3.4.2, 3.4 QuestionTransducer(Stage2) tocorrectlyidentifyMargueritePereyasaperson. In stage 2 of our implementation, the question Thus, this example also illustrates the importance transducer takes as input a declarative sentence of ranking to avoid questions whose features are and produces as output a set of possible ques- stronglyassociatedwitherrors(stage3,§3.5). tions.1 Itidentifiestheanswerphraseswhichmay The question transducer aims to overgenerate be targets for WH-movement and converts them grammatical, though perhaps irrelevant or unim- into question phrases. In the current system, an- portant, questions. These rules encode a sub- swerphrasescanbenounphrasesorprepositional stantial amount of linguistic knowledge. They phrases, which enables who, what, where, when, (1) mark phrases that cannot be answer phrases, andhowmuchquestions. due,forexample,toislandconstraints;(2)remove While many of the answer phrases turn out to eachanswerphraseandgeneratepossiblequestion bevalidanswerstothegeneratedquestions,some phrasesforit;(3)decomposethemainverb;(4)in- exhibit referential ambiguity (e.g., the planet in a vert the subject and auxiliary verb; and (5) insert textonthesolarsystem). Weleavethegeneration one of the question phrases. (Note that some of ofcorrectanswersanddistractorstofuturework. these steps do not apply in some cases.) We dis- cusseachoftheseinturn. 1The main clauses of declarative sentences are marked as “S” in the Penn Treebank. We leave inverted clauses 2From “Francium.” Wikipedia: The Free Encyclopedia. (“SINV”)forfuturework. RetrievedDec.16,2008. DescriptionofTransformation Expression Sentence-initialconjuctionsareremovedbydeletingconj. ROOT < (S < CC=conj) Sentence-initialadjunctphrasesareremovedbydeletingadjunct.(A ROOT < (S < (/[ˆ,]/=adjunct nearlyidenticalruledeletescommasfollowingtheseadjuncts.) $.. (/,/ $.. VP))) Appositives are removed by deleting app, lead, and trail. (A SBAR|VP|NP=app $, /,/=lead nearlyidenticalruledeletesparentheticalphrases.) $. /,/=trail !$ CC !$ CONJP Table1:Rulesusedinstage1tosimplifyandcompresssentences. DescriptionofTransformation Expression A new tree is constructed from an appositive phrase app mod- NP !< CC !< CONJP < (NP=noun $.. ifying a noun phrase noun by creating a sentence of the form (/,/ $.. (NP=app $.. /,/))) noun copula app,wherecopulaisinthepasttenseandagrees withtheheadofnoun. Anewtreeisconstructedfromafiniteclausebyplacingfiniteunder S=finite !>> NP|PP < NP < anewrootnode,withtheappropriatepunctuationpunct. Thereisa (VP < VBP|VB|VBZ|VBD|MD) checktoavoidextractingclausesdominatedbynounphrasesorprepo- ?< /\\./=punct sitionalphrases.(Extractingphrasesinsuchcasestoooftenledtovague orungrammaticalquestionsduringdevelopment.) Anewtreeisconstructedfromaverbalmodifier(e.g.,boughtbyJohn NP=noun > NP $.. VP=modifier inThisisthecarboughtbyJohn.) bycreatingasentenceoftheform noun copula modifier,wherecopulaisinthepasttenseand agreeswiththeheadofnoun. Anewtreeisconstructedfromarelativeclausebycreatingasentencein NP=noun > NP $.. (SBAR < whichnounisthesubjectorobjectoftheclauseinrelthatismissing S=rel !< WHADVP !< WHADJP) asubjectorobject.relclauseisrecursivelysearchedtofindaclause missinganounphraseoraverbphrasemissinganounphrase.Thenoun phrasemodifiedbytherelativeclauseisassumedtobethesubjector object of that clause even though noun phrases are not marked in the parsetrees. Themainclauseofthenewlyextractedsentenceisoften relclauseitself,butnotalways(e.g.,Marymetthemanwouldbe extractedfromThemanJohnsaidMarymet.) Table2:Rulesusedinstage1toextractsimplesentencesfrommorecomplexones. Next we described the sequence of operations In particular, the constraint which encodes that which the system performs to convert declarative nounphrasesareislandstomovementensures,for sentences into questions. This process is illus- example, that phrases in relative clauses cannot tratedinFigure2. undergo movement. For example, the spurious question ∗Who did I buy the book that inspired? 3.4.1 MarkingUnmovablePhrases willnotresultfromIboughtthebookthatinspired AsetofTregexexpressionsmarksthephrasesin Bob. It also disallows movement of prepositional aninputtreewhichcannotbeanswerphrasesdue phrase objects, in order to avoid two very similar to constraints on WH-movement. Each of those questionsandtosimplifytherules(e.g.,Towhom treenodesisrenamedbyextractingthenode’scur- didIgivethebook. andWhomdidIgivethebook rent label, prepending a special marker on the la- to.). bel (“UNMV-”), and relabeling the node with a Tsurgeon operation. The expressions are listed 3.4.2 GeneratingPossibleQuestionPhrases anddescribedinTable3. After marking unmovable phrases, the transducer Theoperationsatthisstepoccurinparallel(i.e., iterates over the possible answer phrases. For their ordering does not matter) and can therefore each one, it copies the input tree, then removes easily be extended or improved upon. There are theanswerphraseandgeneratespossiblequestion two exceptions to the parallel operation of rules phrases from it. (This step is skipped for yes-no which serve to propagate constraints down the questions.) 3 trees. These two rules are applied after all others. Thequestionphrasesforagivenanswerphrase The first marks as unmovable all nodes under an consistofaquestionword(e.g.,who,what,where, unmovablenode. Thesecondmarksasunmovable when), possibly preceded by a preposition and, in allnodesunderanotherwisemovablenode,which 3Thisprocessallowsustoextractapotentialansweralong encodes the fact that noun phrases are islands to witheachquestion. However, weleavetheexplorationand movement. evaluationofextractinganswersforfuturework. Constituentstomarkasunmovable Expression Adjunctclausesunderverbphrases.Suchphrasestypicallyfollowcom- VP < (S=unmv $,, /,/) mas. Clause-levelmodifiers,whichareanynodesdirectlyunderaclausal,or S < PP|ADJP|ADVP|S|SBAR=unmv “S”,nodeexceptnounsandverbs. Phrasesunderconjunctions. Asingleconjoinednodecannotundergo /\\.*/ < CC << WH-movementwhileitssiblingsdonot(e.g.,∗WhodidJohnmeetand NP|ADJP|VP|ADVP|PP=unmv Mary?fromJohnmetBobandMary.). Nounphrasesinadjunctclauses,assumingsubordinateclauseswithan SBAR < (IN|DT < /[ˆthat]/) explicit complementizer other than that are adjuncts (e.g., whether a << NP|PP=unmv recessionisonthehorizon). Phrasesunderaquestionphrase.Thisconstraintisrelatedtotheideaof SBAR < /ˆWH.*P$/ subjacency(Chomsky,1973),asisthenext. << NP|ADJP|VP|ADVP|PP=unmv Thesubjectofacomplementphrasewhenanexplicitcomplementizer SBAR <, IN|DT ispresent(e.g.,∗WhodidJohnsaythatisold?fromJohnsaidthatBob < (S < (NP=unmv !$,, VP)) isold.). Phrasesunderunmovablenodes. Thispropagatestheconstraintsdown @UNMV << NP|ADJP|VP|ADVP|PP=unmv thetree. Anynodesunderanotherwisemovablephrase. Thisencodesthecon- NP|PP|ADJP|ADVP straintthatnounphrasesareislandstomovement. << NP|ADJP|VP|ADVP|PP=unmv Optionalexpressionsforconservativerestrictionsonthesetofpossibleanswerphrases: Prepositional phrases that do not have a noun phrase object (e.g., on PP=unmv !< NP seeinganoldfriend). Bothofapairofnounphrasesthataresiblingsinthetree(e.g.,toavoid S < (NP=unmv $ NP|UNMV-NP) ∗WhodidJohngiveabook?fromJohngaveMaryabook.). Existentialtherenounphrases. Thisensuresagainstspuriousquestions NP=unmv < EX suchas“Whatwasadoginthepark?”beinggeneratedfromTherewas adoginthepark. Table3:TreesearchingexpressionsforidentifyingphraseswhichmaynotundergoWH-movement.TheprefixUNMV-identifies nodesalreadymarkedasunmovable. QuestionPhrase Conditions Who anounphrasewhoseheadislabeledPERSON,PER DESC,orORGANIZATION Where a noun phrase whose head is labeled LOCATION, or is a prepositional phrase with certain prepositions(in,at,on,over)whoseheadislabeledLOCATION When anounphrasewhoseheadislabeledDATEorTIME,orisaprepositionalphrasewithcertain prepositions(in,at,on,over)whoseheadislabeledDATEorTIME Howmuch a noun phrase with a quantifier phrase or word (“QP” or “CD”) and whose head is labeled MONEY A preposition followed aprepositionalphrasewhoseobjectisanounphrasethatsatisfiesoneoftheaboveconditions byanyoftheabove Table4:Conditionsforgeneratingquestionswithcertainquestionphrases. the case of question phrase like whose car, fol- For a given answer phrase, the system uses the lowedbytheheadoftheanswerphrase. phrase’s entity labels and syntactic structure to To generate the question phrases, the system generate a set of zero or more possible question annotates the source sentence with a set of en- phrases, each of which is used to generate a final tity types taken from the BBN Identifinder Text question sentence. Table 4 describes the condi- Suite (Bikel et al., 1999). The set of labels from tions under which each type of question phrase is BBN includes those used in standard named en- produced. tity recognition tasks (e.g., “PERSON,” “ORGA- 3.4.3 DecompositionoftheMainVerb NIZATION”asintheMUC-6NamedEntityTask) and their corresponding types for common nouns In order to perform subject-auxiliary inversion (e.g., “PER DESC,” “ORG DESC”). It also in- (§3.4.4), if an auxiliary verb or modal is not cludesdates,times,monetaryunits,andothers.4 present, the question transducer decomposes the main verb into the appropriate form of do and 4Weusetheopen-sourceStanfordNERtool(Finkeletal., the base form of the main verb. It then modifies 2005)toperformtheannotation. WeretrainedtheStanford NERmodelon102textsthatwerelabeledbytheproprietary the tree structure of the verb phrase accordingly. BBNIdentifindertool,essentiallybuildingaCRFmodelthat predicts the tags, albeit with noisy data. The training texts WikipediaasofDec. 16,2008. Performancewasnotevalu- wererandomlysampledfromthesetoffeaturedarticleson atedrigorouslybutjudgedadequateonmanualinspection. Purpose Expression Toidentifythemainverbfordecompositioninto ROOT < (S=clause < (VP=mainvp < aformof“do”andthebaseform. /VB.?/=tensed !< (VP < /VB.?/))). To identify themain clause for subject-auxiliary ROOT=root < (S=clause <+(/VP.*/) (VP < inversion. /(MD|VB.?)/=aux < (VP < /VB.?/=verb))) To identifythe mainclause forsubject auxiliary ROOT=root < (S=clause <+(/VP.*/) (VP < inversioninsentenceswithacopulaandnoaux- (/VB.?/=copula < is|are|was|were|am) !< VP)) iliary(e.g.,Thecurrency’svalueisfalling). Table5:Treesearchingexpressionsinthequestiontransducer. Figure2:ProcessoftransformingdeclarativesentencesintoquestionsintheQuestionTransducer(stage2). Parenthesesmark stepswhichmaynotbenecessaryforcertainquestions.∗markstepswhichmayproducezeroormultipleoutputs. For example, John saw Mary becomes John did is a non-subject noun phrase. The bottom two seeMarybeforetransformationintoWhodidJohn Tregex expressions in Table 5 identify the main see? ratherthan∗SawJohnMary?. clauseandtherelevantnodesintheverbphrase. Ifanauxiliaryverbisalreadypresent,however, Themainclausenode,initially“S”,isfirstrela- thisdecompositionisnotnecessary(e.g.,Johnhas beled as “SQ”, indicating that it is part of a ques- seen Mary. could lead to Who has John seen?). tion. Then the auxiliary or copula is moved so In such cases, the main verb phrase includes the that it becomes the first child. The “SQ” node is auxiliaryverbandanestedverbphrasecontaining then used to form a new tree for the sentence. In thebaseformofthemainverb. the case of yes-no questions, the root node of the Thesystemidentifiesmainverbsthatneedtobe question tree will have the “SQ” node as its only decomposed with the Tregex expression shown child. In the case of WH-questions, the root node atthetopofTable5. hasan“SBARQ”nodeasitschild. This“SBARQ” In order to convert between lemmas of verbs node then has the “SQ” node as a child, which and the different surface forms that correspond to will be preceded by a question phrase node (e.g., different parts of speech, we created a map from “WHNP”). pairsofverblemmaandpartofspeechtoverbsur- Aftertransformingthemainclauseandrelabel- face forms. We extracted all verbs and their parts ingit“SQ,”anyleadingadjunctphrasesunderthis of speech from the Penn Treebank (Marcus et al., node and preceding a comma are moved to the 1993). We lemmatized each verb first by check- front of the final sentence (e.g., to produce more ing morphological variants in WordNet (Miller et natural questions like Following Thomas Jeffer- al., 1990), and if a lemma was not found, then son, who was elected the 4th president? rather trimming the rightmost characters from the verb than the more awkward Who, following Thomas one at a time until a matching entry in WordNet Jefferson,waselectedthe4thpresident?). was found. This simple approach works in prac- tice for English because most verb forms either arederivedfromthelemmabyaddingafewletters 3.4.5 InsertingQuestionPhrases (e.g.,“ed”)oraremappedtolemmasinWordNet’s database. Each possible question phrase is inserted into a copy of the tree to produce a question. The ques- 3.4.4 Subject-AuxiliaryInversion tion phrase is inserted as a child of the “SBARQ” The transducer performs subject-auxiliary inver- node under the root node, following any leading sion either when the question to be generated sentence-leveladjunctphrases. Ifthequestionisa is a yes-no question or when the answer phrase yes-noquestion,thenthisstepisnotnecessary. QuestionDeficiency Description Ungrammatical ThequestiondoesnotappeartobeavalidEnglishsentence. Doesnotmakesense Thequestionisgrammaticalbutindecipherable.(e.g.,Whowastheinvestment?) Vague Thequestionistoovaguetoknowexactlywhatitisaskingabout,evenafterreadingthearticle (e.g.,WhatdidLincolndo?). Obviousanswer Thecorrectanswerwouldbeobviouseventosomeonewhohasnotreadthearticle(e.g.,the answerisobviouslythesubjectofthearticle,ortheanswerisclearlyyes). Missinganswer Theanswertothequestionisnotinthearticle. WrongWHword ThequestionwouldbeacceptableiftheWHphraseweredifferent(e.g.,inwhatversuswhere). WHphrasesincludewho,what,where,when,how,why,howmuch,whatkindof,etc. Formatting Thereareminorformattingerrors(e.g.,withrespecttocapitalization,punctuation) Table6:Possibledeficienciesageneratedquestionmayexhibit. 3.4.6 Post-processing Burr?5 Thatcertainfeaturesoftheinputandcertainop- Some additional post-processing mechanisms are erations are more strongly associated with errors necessary to ensure proper formatting and punc- suggests that scoring questions by some accept- tuation. Sentence-final periods are changed to ability measure may allow us to rank them effec- question marks. Additionally, the output is de- tively. In this section, we describe our implemen- tokenized to remove extra whitespace (e.g., pre- tationofamoduletoscoreandrankquestions. cedingpunctuationsymbols). We observed in preliminary output of our sys- 3.5.1 Model tem that nearly all of the questions including pro- We use a discriminative reranker (Collins, 2000), nounsweretoovague(e.g.,Whatdoesithaveasa specifically based on a logistic regression model head of state? from a Simple English Wikipedia that defines a probability of acceptability, given article on Thailand.). We therefore filtered all the question q and source text t: p(· | q,t). We questionswithpersonalpronouns,possessivepro- useatodenote“acceptable”andutodenote“un- nouns, and noun phrases consisting solely of de- acceptable”;p(u | q,t) = 1−p(a | q,t). terminers (e.g., those), which eliminated a sub- Questions may be deficient, or unacceptable, stantial portion of the possible questions our sys- in various ways. In Table 6, we enumerate a tem might output. This suggests a possible fu- set of possible deficiencies that humans may be ture extension to leverage coreference resolution able to identify. In our first ranking approach andreferringexpressiongenerationtoreplacepro- (“Boolean”),wecollapsethepossibledeficiencies nounswithreferringexpressions. sothatquestionswithanyofthepossibledeficien- ciesaretreatedasunacceptable,andlearnalogis- 3.5 QuestionRanker(Stage3) ticregressionmodelover{a,u}. Inoursecondapproach(“Aggregate”),welearn Thequestiontransducerandthesentencetransfor- separateconditionalmodelsoftheprobabilityofa mation stages overgenerate, producing erroneous givenquestionbeingacceptableaccordingtoeach questions for various reasons. We already noted ofthedeficienciesinTable6,thencombinethem: anexamplerelatedtoentityrecognitionin§3.4. p(a | q,t) = (cid:81)K p (a | q,t) (1) Anotherexampleisthaterrorsduringautomatic i=1 i i parsing may propagate. If the sentence Bob do- where i indexes different types of acceptability natedthebookinhisbackpacktothelibrarywere (with respect to grammaticality, making sense, parsed such that the phrase in his backpack at- vagueness, etc.), and K is the number of types, tached to the verb donated rather than book, the 8inourcase. erroneous question ∗What did Bob donate in his 3.5.2 FeaturesandParameterEstimation backpacktothelibrary? wouldresult. Bothrankingmodelsusethesamefeatures,which The sentence transformation module in stage 1 arelistedinTable7. isparticularlypronetoerrors. Forexample,itdoes We do not claim that this is the optimal set of not consider negation or semantics when extract- features. That is, the ranking model could cer- ing finite clauses, and thus for the sentence John tainly be improved by refining the feature set. never believed that Hamilton shot Aaron Burr, it suggests the misleading question Who shot Aaron 5Infact,AaronBurrshotAlexanderHamilton. Feature generated from the text. They worked in a web- Thenumbersoftokensinthesourcesentence,question based interface and could re-read the article and sentence,andanswerphrase. Themeanunigramlanguagemodelprobabilitiesofthe alter any of their previous ratings as they saw fit. sourcesentenceandanswerphrase. They were asked to consider each question inde- Binaryvariablesindicatingwhethertheinputsentence pendently, such that similar questions about the wasderivedfromarelativeclause,aparticipialphrase, an appositive phrase, or a simplified version of the sameinformationwouldreceivesimilarratings. sourcesentence. Three people rated each of the questions in the AbinaryvariableindicatingtheuseofeachpossibleWH test set. Since the problem deficiencies are not word,orwhetherthequestionwasayes-noquestion. Theparseloglikelihoodofthesourcesentence,aswell mutually exclusive, to estimate inter-rater agree- astheloglikelihoodnormalizedbylength. ment we computed separate Fleiss’s κ values for Binary variables indicating the presence of different each deficiency. The values are given in Table 8. partsofspeechintheanswerphrase. Abinaryvariableindicatingwhethertheanswerphrase Theκvalueof0.42forthe“Acceptable”category isaprepositionalphraseoranounphrase. corresponds to “moderate agreement”(Landis and Abinaryvariableindicatingwhethertheanswerphrase Koch, 1977). The agreement is lower than other is the subject in the input sentence (e.g., requires do- inversion). rating schemes,7 due in part to the rating scheme Binary variables indicating the main predicate’s tense but also to the fact that the raters were novices (past,present,future). (note, for instance, that κ is only 0.495 for “for- Thenumberofnounphrasesinthequestion. Thenumberofprepositionalphrasesinthequestion. matting”). However, the fact that 61.5% of the A binary variable indicating the presence of negation test-set question ratings for the “acceptable” cat- wordsinthesourcesentence(not,no,andnever). egorywereunanimousgivessomeconfidence. Table7:Featuresforrankingquestions. Primarily, we report results based on major- ity ratings (i.e., the rating assigned by 2 of the 3 raters). However, since agreement was moderate, In particular, features based on n-gram language and since, in spot-checks, we observed that raters modelsofquestionsmightprovideusefulinforma- appearedmorelikelytoliberallyacceptbadques- tionifanappropriatecorpusofquestionscouldbe tions than to reject good ones, we also report re- found. We did not explore this particular type of sultsinwhichaquestionisdeemedtohaveapar- featureforourexperimentsbecausefromobserva- ticular deficieny if any of its three raters labeled tions of preliminary output, it appeared that most it as having that deficiency. This second set of syntacticerrorswererelatedtolongerdistancede- measurementsprovidesuswithanestimateofthe pendenciesthanwhatwouldbemodeledbyann- lowerboundonthequalityofgeneratedquestions. gram model. The ranker could also include fea- In addition to the test set, we created a training tures for specific constructions that might appear datasetforlearningtorankquestions. Inthetrain- in input sentences, or even features tailored to a ingset,eacharticle’squestionswereratedbyonly particularQGapplication. oneperson. We estimate the parameters by optimizing the regularized log-likelihood of the training data (cf. 4.1 Corpora §4.1), with the regularization constant selected The training and test data sets consisted of ques- throughcross-validation. tionsaboutarticlesfrom4corpora. 4 Evaluation One corpus, (WIKI-ENG) was a random sam- ple from the featured articles in the English No “standard” evaluation task yet exists for QG. Wikipedia8 that had between 250 and 2,000 word To evaluate our implemented QG framework, tokens. This English Wikipedia corpus provides we conducted an experiment in which 15 native expository texts written at an adult reading level English-speakinguniversitystudentsratedthesys- fromavarietyofdomains,whichroughlyapproxi- tem’soutput,indicatingwhethereachquestionex- matestheprosethatasecondaryorpost-secondary hibitedanyofthedeficiencieslistedinTable6.6 7E.g., Dolan and Brockett (2005) and Glickman et al. Annotatorswereaskedtoreadthetextofanar- (2005)reportκvaluesaround.6forparaphraseidentification ticle and then rate approximately 100 questions andtextualentailment,respectively. 8The English and Simple English Wikipedia data were 6The ratings from one person were excluded due to an downloaded on December 16, 2008 from http://en. extremelyhighrateofacceptingquestionsaserror-freeand wikipedia.org and http://simple.wikipedia. otherirregularities. org,respectively.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.