ebook img

On (Commercial) Benefits of Automatic Text Summarization Systems in the News Domain: A Case of Media Monitoring and Media Response Analysis PDF

0.23 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview On (Commercial) Benefits of Automatic Text Summarization Systems in the News Domain: A Case of Media Monitoring and Media Response Analysis

On (Commercial) Benefits of Automatic Text Summarization Systems in the News Domain: A Case of Media Monitoring and Media Response Analysis PashutanModaresi and PhilippGross and SiavashSefidrodi and MirjaEckhof pressrelationsGmbH D-40211Düsseldorf,Germany [first name].[last name]@pressrelations.de StefanConrad InstituteofComputerScience HeinrichHeineUniversityofDüsseldorf [email protected] 7 1 0 2 Abstract odsthatselectthemostsalientsentencesinadoc- n ument and by abstractive we mean methods that a In this work, we present the results of a J incorporate language generation to reformulate a systematic study to investigate the (com- 3 documentinareductiveway. mercial)benefitsofautomatictextsumma- Automatictextsummarizationhasbeenapplied ] rization systems in a real world scenario. L to many domains, among which is the news do- More specifically, we define a use case in C main the focus of this work. Despite many at- . the context of media monitoring and me- temptstoimprovetheperformanceofsummariza- s c dia response analysis and claim that even tion systems (Ferreira et al., 2016; Wei and Gao, [ using a simple query-based extractive ap- 2015), to the best knowledge of the authors, no 1 proach can dramatically save the process- systematicstudywasperformedtoinvestigatethe v ingtimeoftheemployeeswithoutsignifi- (commercial) benefits of the summarization sys- 8 cantlyreducingthequalityoftheirwork. 2 temsinarealworldscenario. 7 We claim that using (even very) simple au- 0 1 Introduction 0 tomatic summarization systems can dramatically . Automatic text summarization has been an evolv- improve the workflow of employees without af- 1 0 ing field of research. Having started with the pio- fecting their quality of work. To investigate our 7 neering work of Luhn (Luhn, 1958), specifically claimwedefineausecaseinthecontextofmedia 1 : in recent years, automatic text summarization has monitoring and media response analysis (Section v made remarkable signs of progress with the pop- 2)andestablishseveralcriteriatomeasuretheef- i X ularity of deep learning approaches (Rush et al., fectiveness of the summarization systems in our r 2015;Chopraetal.,2016). use case (Section 3). In Section 4 we discuss the a Providing a formal definition of automatic text design of our experiment and report the results in summarization is rather a challenging task. This Section 5. Finally, in Section 6 we conclude our workpursuesthefollowingdefinition: Givenaset work. Q of queries, automatic text summarization is a 2 UseCaseDefinition reductive transformation of a collection of docu- ments D with |D| ą 0 into a single or multiple We investigate the (commercial) benefits of inte- target document(s), where the target document(s) gratinganautomaticsummarizationsysteminthe are more readable than the documents in D and semi-automatic workflows of media analysts do- contain the relevant information of D according ing media monitoring and media response analy- to Q (Modaresi and Conrad, 2015). This defi- sis at pressrelations GmbH1. In the following, we nition, comprises both extractive and abstractive approaches, where by extractive we mean meth- 1http://www.pressrelations.de/ shortlydefinethetermsasmentionedabove. is determined based on a query. For instance, the The goal of media monitoring is to gather all querymightbeanamedentity,andweexpectthat relevant information on specific topics, compa- the summary contains all relevant information re- nies or organizations. To this end, search queries gardingthenamedentity. are defined, with which the massive amount of The term readability refers to the coherence available information can be filtered automati- and the grammatical correctness of the summary. cally. Typically, in a post-processing step, the While the grammatical correctness is defined at qualityofthegatheredinformationisincreasedus- the sentence level, the coherence of the summary ingmanualfilteringbytrainedmediaanalysts. is determined on the whole text. That means that In media response analysis, the publications in the sentences of the summary should not only be the media (print media, radio, television, and on- grammatically correct in isolation, but also they linemedia)areevaluatedaccordingtovariouspre- mustbecoherenttomakethesummaryreadable. definedcriteria. Asaresultofthis,itispossibleto Both completeness and readability are criteria deducewhetherandhowjournalistshaverecorded that are difficult to evaluate and define formally, andprocessedthePR(PublicRelations)messages. andithasbeenshownthattheyarebothverysub- Possiblequestionstobeansweredinthecontextof jectivecriteria,wheretheirassessmentvariesfrom mediaresponseanalysisare: Howarethepublica- person to person (Torres-Moreno, 2014). In the tions distributed over time? How many listeners, caseofcompleteness, itisunclearhowtoformal- viewersorreaderswerepotentiallyreached? What izetherelevanceofinformation,andininthecase arethetonalityandopiniontendencyofthepubli- of readability the same holds for the concept of cations? (Grupe,2011) coherence. Typically, analysis results are given to the Therefore, we define the quality of a summary clientsintheformofextensivereports. Inthecase fromapracticalandcommercialpointofview. For of textual media, the immense amount of time re- this, we define the quality of a summary in terms quired to read texts and to write abstracts and re- of a binary decision problem where the question portsisahighcostfactorinthepreparationofme- to be asked is: can the produced summary in its diaresonanceanalysisreports. currentformbedeliveredtoacustomerornot? Weclaimthatthedescribedprocesscanbepar- Furthermore,inourusecase,thegainintimeis tially optimized by incorporating automatic sum- defined as the processing time that can be saved marization systems, leading to remarkable finan- by media analysts, assisted by a summarization cialadvantagesforthecompanies. system. It should be clear that the reduction of the processing time could lead to the reduction of 3 EvaluationCriteria costsinacompany. In the following section, the design of our ex- Fromthecommercialandacademicpointofview, periment with respect to the criteria mentioned thequalityofthesummariesplaysanimportrole. above(qualityandgainintime)willbeexplained. VariousautomaticmethodssuchasROUGE(Lin, 2004), BLEU (Papineni et al., 2002), and Pyra- 4 ExperimentSetup mid (Nenkova et al., 2007) have been used suc- cessfullytoevaluatethequalityofthesummaries. Toconductourexperimentsweincorporatedeight Moreover, manualevaluationhasalsobeenincor- media analysts (specialists in writing summaries porated for quality assessment of the summaries for customers) and divided them into two equi- (Modaresi and Conrad, 2014). Another important sized groups. One group received only the news criterionthatismostlyneglectedinacademicpub- articles(GroupA),andtheotheronereceivedonly licationsisthegainintime,definedastheamount thequery-basedextractedsummarieswithouthav- of saved time by a user through the usage of the ing access to the original articles. Given a query summaries. consisting of a single named entity, both groups In our use case, the quality of a summary com- wereaskedtowritesummarieswiththefollowing prisesoftwoaspects: completenessandreadabil- properties: ity. Thetermcompleteness,describestherequire- ment of a summary to contain all relevant infor- • Thesummaryshouldbecompactandconsist mationofanarticle. Therelevanceofinformation ofmaximumtwosentences. • The summary should contain the main topic Algorithm1Query-basedSummarization ofthearticleandalsothemostrelevantinfor- 1: procedureSUMMARIZE(T, Q) mationregardingthequery. 2: S Ð H 3: T1 Ð SegmentpTq As previously stated, the summaries created by 4: E Ð EntityDistributionpT1q mediaanalystswereevaluatedbasedontwocrite- 5: m Ð MedianpEq ria: qualityandgainintime. Thegainintimewas 6: E1 Ð H measured automatically using a web interface by 7: fore in Edo trackingtheprocessingtimeofthemediaanalysis 8: iffreq(e)>mthen for creating the text summaries. We interpret the 9: E1 Ð E1Ye gain in time as the answer to the question: In av- 10: S Ð LeadpT1q erage,whatpercentagefaster/slowerisgroupAin 11: S Ð QueryMatchpT1,Qq comparetogroupB?. Lett˜Aandt˜Bbetheaverage 12: S Ð CentralEntityMatchpT1,E1q processingtimesofthemediaanalystsingroupA 13: returnS and B respectively. We define gain in time as in Equation1. The pseudocode of the invoked query-based ex- tractivesummarizerisdepictedinAlgorithm1. $ ˇ ˇ &100¨ˇˇ1´ t˜Aˇˇ ift˜ ď t˜ In line 2 the summary S is initialized with an gpt˜A,t˜Bq “ %100¨ˇˇˇ1´ tt˜˜t˜BBAˇˇˇ ift˜AA ą t˜BB (1) memenptteydsient.toGseivnetnentcheesiannpdutsttoerxetdTin, tthheeltiesxttTis1(sleinge- 2). In line 3, the named entities of the text are Notice that it holds gpt˜ ,t˜ q “ gpt˜ ,t˜ q and g recognized and stored in a dictionary where each A B B A reflects only the magnitude of the saved time and key represent a named entity, and its correspond- not its direction. The direction can be determined ing value is the frequency of the named entity in basedonthevaluesoft˜ andt˜ . the text. Lines 5-9 depict the procedure to select A B Ontheotherhand,thequalityofthesummaries central named entities. Let m be the median of was evaluated by a curator (an experienced me- the named entities frequencies. A named entity e dia analyst in direct contact with customers). The is called a central named entity if its frequency in curator was responsible for evaluating the sum- the text is higher than the twice of the median. In maries created by media analysts in both groups line 10 we add the lead of the news article to the and scored them with a zero or a one. With zero summary, as the lead usually can be interpreted meaning that the quality of the summary is not as a compact summary of the whole article. Af- sufficient and the product cannot be delivered to terwards in line 11, the sentences that contain the theclientandwithonemeaningthequalityofthe queryQareaddedtothesummary. Finally,weex- summaryissufficientenoughtobedeliveredtothe tendthelistofsummarysentenceswithsentences customer. Let the vector q of size m be a one-hot containing the central named entities and return vector consisting of 0s and 1s, where the i-th ele- thesummary. ment in q represents the evaluation of the curator 5 Results forthei-thsummaryamongthemavailablesum- maries. Giventhat, wecomputetheaveragesum- Intotal,wecollected80summariescreatedbythe mary quality of a set of summaries by computing mediaanalystsinbothgroups. Foreachsummary, theaverageofitscorrespondingevaluationvector its processing time and its quality evaluated by a q. curatorwasrecorded. Basedonthecollecteddata, In total, ten news articles were provided to the weansweredthefollowingquestions: media analysts. The articles for group A had an 1. Intergroupprocessingtime: Isthereasignifi- averagewordcountof1438withthestandardde- cantdifferencebetweentheprocessingtimes viation being 497. Group B received only the ofindividualmediaanalystsinagroup? summaries of the articles, created automatically with a heuristic-based approach. The automati- 2. Intergroup quality: Is there a significant dif- cally generated summaries had an average length ference between the quality of the created of81wordswiththestandarddeviationbeing23. summariesbythemediaanalystsinagroup? 3. Intragroupprocessingtime: Isthereasignifi- the chosen significance level α “ 0.05. Thus the cantdifferentbetweentheaverageprocessing null hypothesis will be rejected for A3, A4, and timesofmediaanalystsingroupsAandB?If B1,meaningthattheprocessingtimesofthemare so,whichgrouphasafasterprocessingtime? not normally distributed. For other media ana- lysts, the normality assumption holds. Although 4. Intragroup quality: Is there a significant dif- in several cases the normality requirement of the ference between the average qualities of cre- ANOVA test is violated, it is still possible to use ated summaries by media analysts in groups theANOVAtest,asitwasshownthattheANOVA A and B? If so, which group created more test is relatively robust to the violation of the nor- qualitativesummaries? malityrequirement(Kirk,2012). ThesecondrequirementtoperformtheANOVA The remaining of this section reports the an- test is that the processing times of the media ana- swerstotheabovequestions. lysts have equal variances. For this, we use the 5.1 IntergroupProcessingTime Bartlett’s test (Dalgaard, 2008) with the null hy- pothesisthattheprocessingtimesofthemediaan- The processing times of the media analysts in alysts have the same variance. The results of the group A (A1-A4) and group B (B1-B4) are visu- Bartlett’s test for groups A and B are reported in alizedusingboxplotsinFigures1aand1brespec- Table2 tively. In both groups, the differences among the averageprocessingtimesareobservable. Ourgoal Group χ2 p-value is to investigate whether the differences between A 4.3726 0.2239 the processing times of media analysts is statisti- B 6.9013 0.0751 callysignificant. To compare the means of processing times Table2: Bartlett’sTestforProcessingTimes among the media analysts in a group we use the one-wayanalysisofvariance(one-wayANOVA). In Table 2, χ2 is the test statistic and we re- ThenullhypothesisintheANOVAtestisthatthe ject the null hypothesis if the p-value is less than mean processing times of the media analysts in the chosen significance level α “ 0.05. For both a group are the same. To perform the ANOVA groups,thep-valueisgreaterthanthesignificance test we first examine if the requirements of the level, and thus there is no evidence that the vari- ANOVAtestaresatisfied(Miller,1997). ances of processing times of individual media an- ThefirstrequirementoftheANOVAtestisthat alystsaredifferent. the processing times of the individual media ana- Having investigated the assumptions of the lystsarenormallydistributed. Forthis,weusethe ANOVA test, we now report the results of the Shapiro-Wilk test (Shapiro and Wilk, 1965) with ANOVAtest(SeeTable3). thenullhypothesisbeingthattheprocessingtimes are normally distributed. Table 1 reports the re- Group Fvalue p-value sultsofthetest. A 1.413¨1033 ă2¨10´16 B 4.2¨1034 ă2¨10´16 MediaAnalyst W p-value A1 0.90244 0.233 Table3: ANOVATestforProcessingTimes A2 0.91638 0.3277 A3 0.76592 0.0055 A4 0.73605 0.0024 InTable3, theFvalueistheFteststatisticand B1 0.85143 0.0604 we reject the null hypothesis if the p-value is less B2 0.95625 0.7425 than the chosen alpha level α “ 0.05. Thus, the B3 0.94609 0.6226 meanprocessingtimesofmediaanalystsingroup B4 0.93536 0.5026 A are not the same and there is a significant dif- ference between them. The same hold for group Table1: Shapiro-WilkTestforProcessingTimes B. The so far shown results crystallize an impor- In Table 1, W is the test statistic and we re- tantpropertyofthesummarizationprocess. Given ject the null hypothesis if the p-value is less than the same set of news articles and the same brief- 14 15 es 12 ut erage Processing Time in Min150 Processing Time in Minutes14680 v A 2 0 0 A1 A2 A3 A4 B1 B2 B3 B4 Group A Group B Media Analysts Media Analysts (a)ProcessingTimesofIndividualMediaAnalysts (b)ProcessingTimesofGroupsAandB Figure1: ComparisonoftheProcessingTimes ingtoallmediaanalysts,theaveragetimerequired other, we use the Fisher’s Exact Test (Dalgaard, by the media analysts within a group to summa- 2008) with the null hypothesis that the qualities rizethearticlesissignificantlydifferentfromeach arenotdifferentfromeachother. ForgroupAwe other. have a p-value of 0.1087, and for group B the p- valueis0.0022. Thusthenullhypothesiscanonly 5.2 IntergroupQuality be rejected for group B. The so far shown results The results of the manual evaluation of the sum- lead us to the following conclusions: Given the maries by the curator are represented in Table 4. news articles, no significant difference among the In this section, our goal is to systematically in- qualities of the produced summaries by the me- vestigate whether the qualities of the summaries dia analysts can be observed. Furthermore, given producedbymediaanalystsinagrouparesignifi- onlytheautomaticallycreatedsummaries,theme- cantlydifferentfromeachother. diaanalystsproducesummarieswithsignificantly Different from the previous section where we differentqualities. compared the processing times of the media ana- lysts in a group using the ANOVA test, the com- 5.3 IntragroupProcessingTime parison of the qualities among the media analysts So far, we only investigated the intergroup prop- cannot be performed using the ANOVA test (due erties. In this section, we answer the question to the huge violation of the normality assump- whether there exists a significant difference be- tion). Therefore, we interpret the evaluation re- tweentheaverageprocessingtimesofgroupAand sults of each media analyst as a Binomial dis- B? tribution Bpn,pq with n “ 10 (number of arti- InFigure1b,theprocessingtimesofthegroups clesshowntoeachmediaanalyst)andpbeingthe A and B are compared using boxplots. Using the numbersoftimesthecuratorwassatisfiedwiththe Equation1,wecomputethegainintimeforgroup qualityofthesummariescreatedbythemediaan- B,thatisroughly58%, meaningthatasexpected, alyst. the media analysts in group B required much less timetocreatethesummariesincomparetotheme- GroupA GroupB diaanalystsingroupA.SimilartotheSection5.1, A1 A2 A3 A4 B1 B2 B3 B4 Quality 0.9 1.0 0.8 0.6 0.3 0.9 1.0 0.7 we use the ANOVA test to check the significance Overall 0.82 0.72 ofthisoutcome. Theresultsofthetestarereported inTable5. Table4: ResultsofManualEvaluationofQuality In Table 5, F value is the F test statistic and we rejectthenullhypothesisifthep-valueislessthan To test whether the qualities of the produced the chosen alpha level α “ 0.05. Thus, the pro- summaries are significantly different from each cessingtimesofmediaanalystsingroupBaresig- Group Fvalue p-value • The media analysts had a significant gain in Avs.B 2.14¨1033 ă2¨2´16 time by the process of creating the text sum- maries(58%). Table 5: ANOVA Test for Intragroup Processing • Providing the media analysts with automati- Times callycreatedsummariesdoesnothaveaneg- ative impact on the quality of the summaries nificantly lower than the processing times of me- theygenerated diaanalystsingroupA. Theresultsmentionedaboveindicatethatincor- The results show that using a simple query- porating even simple summarization systems can basedextractivesummarizationsystem,themedia dramaticallyimprovetheworkflowoftheemploy- analysts had a significant gain in time by the pro- ees. cessofcreatingthetextsummaries. For future work we plan to repeat our experi- 5.4 IntragroupQuality ment with more sophisticated summarization al- gorithmsandcomparethegainintimetoourbase- Inthefinalstep,wecomparethequalityofthepro- line setting. Furthermore, we plan to increase the ducedsummariesbetweenbothgroupsandanswer number of media analysts to obtain more reliable the question whether there is a significant differ- results. ence between the qualities? To answer this ques- tionweperformtheFisher’sExactTestandobtain Acknowledgments thep-valueof0.4225. Thusthenullhypothesisof the test cannot be rejected and we conclude that This work was funded by the German Federal thequalitiesofthesummariesamongbothgroups MinistryofEconomicsandTechnologyunderthe arenotsignificantlydifferent. ZIMprogram(GrantNo. KF2846504). Using the results above, we conclude that pro- viding the media analysts with automatically cre- References atedsummariesdoesnothaveanegativeimpacton thequalityofthesummariestheygeneratedandno [Chopraetal.2016] Sumit Chopra, Michael Auli, and Alexander M. Rush. 2016. Abstractive Sen- significantdifferenceinqualitycouldbeobserved tenceSummarizationwithAttentiveRecurrentNeu- in compare to the media analysts that had access ralNetworks. InNAACLHLT2016,The2016Con- tothefullnewarticles. ferenceoftheNorthAmericanChapteroftheAsso- ciationforComputationalLinguistics,pages93–98. 6 Conclusions [Dalgaard2008] P.Dalgaard. 2008. IntroductoryStatis- ticswithR. StatisticsandComputing.SpringerNew To investigate the (commercial) benefits of the York. summarization systems, we designed an experi- ment where two groups of media analysts were [Ferreiraetal.2016] Rodolfo Ferreira, Rafael Ferreira, Rafael Dueire Lins, Hilário Oliveira, Marcelo Riss, given the task to summarize news articles. Group and Steven J. Simske. 2016. Applying Link Tar- A received the whole news articles and group B getIdentificationandContentExtractiontoImprove received only the automatically created text sum- Web News Summarization. In Proceedings of the maries. Insummary,weshowedthat: 2016 ACM Symposium on Document Engineering, DocEng’16,pages197–200.ACM. • Theaveragetimerequiredbythemediaana- [Grupe2011] S. Grupe. 2011. Public Relations: Ein lystswithinagrouptosummarizethearticles Wegweiser für die PR-Praxis. Springer Berlin Hei- issignificantlydifferentfromeachother. delberg. [Kirk2012] R.E. Kirk. 2012. Experimental Design: • Given the news articles, no significant dif- ProceduresfortheBehavioralSciences:Procedures ference among the qualities of the produced fortheBehavioralSciences. SAGEPublications. summaries by the media analysts can be ob- served. Furthermore,givenonlytheautomat- [Lin2004] Chin-Yew Lin. 2004. ROUGE: A Pack- age for Automatic Evaluation of Summaries. In ically created summaries, the media analysts Text Summarization Branches Out: Proceedings of produce summaries with significantly differ- theACL-04Workshop,pages74–81.Associationfor entqualities. ComputationalLinguistics. [Luhn1958] H.P.Luhn. 1958. TheAutomaticCreation ofLiteratureAbstracts. IBMJ.Res.Dev.,2(2):159– 165,April. [Miller1997] R.G. Miller. 1997. Beyond ANOVA: Ba- sics of Applied Statistics. Chapman & Hall/CRC TextsinStatisticalScience.Taylor&Francis. [ModaresiandConrad2014] Pashutan Modaresi and StefanConrad. 2014. FromPhrasestoKeyphrases: An Unsupervised Fuzzy Set Approach to Summa- rizeNewsArticles. InProceedingsofthe12thInter- national Conference on Advances in Mobile Com- putingandMultimedia,pages336–341. [ModaresiandConrad2015] Pashutan Modaresi and Stefan Conrad. 2015. On Definition of Automatic Text Summarization. In Proceedings of Second In- ternationalConferenceonDigitalInformationPro- cessing, Data Mining, and Wireless Communica- tions,DIPDMWC2015,pages33–40.SDIWC. [Nenkovaetal.2007] Ani Nenkova, Rebecca Passon- neau, and Kathleen McKeown. 2007. The Pyra- mid Method: Incorporating Human Content Selec- tion Variation in Summarization Evaluation. ACM Trans.SpeechLang.Process.,4(2). [Papinenietal.2002] Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual MeetingonAssociationforComputationalLinguis- tics,ACL’02,pages311–318. [Rushetal.2015] Alexander M. Rush, Sumit Chopra, andJasonWeston. 2015. ANeuralAttentionModel for Abstractive Sentence Summarization. CoRR, abs/1509.00685. [ShapiroandWilk1965] S.S.ShapiroandM.B.Wilk. 1965. An Analysis of Variance Test for Normality (CompleteSamples). Biometrika,52(3/4):591–611. [Torres-Moreno2014] Juan-Manuel Torres-Moreno. 2014. AutomaticTextSummarization. Wiley. [WeiandGao2015] ZhongyuWeiandWeiGao. 2015. Gibberish, Assistant, or Master?: Using Tweets Linking to News for Extractive Single-Document Summarization. In Proceedings of the 38th Inter- national ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’15, pages1003–1006.ACM.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.