ebook img

Text Processing: FIRE 2016 International Workshop, Kolkata, India, December 7–10, 2016, Revised Selected Papers PDF

228 Pages·2018·13.529 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Text Processing: FIRE 2016 International Workshop, Kolkata, India, December 7–10, 2016, Revised Selected Papers

Prasenjit Majumder · Mandar Mitra Parth Mehta · Jainisha Sankhavara (Eds.) 8 7 4 0 1 S Text Processing C N L FIRE 2016 International Workshop Kolkata, India, December 7–10, 2016 Revised Selected Papers 123 Lecture Notes in Computer Science 10478 Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen Editorial Board David Hutchison Lancaster University, Lancaster, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Zurich, Switzerland John C. Mitchell Stanford University, Stanford, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen TU Dortmund University, Dortmund, Germany Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max Planck Institute for Informatics, Saarbrücken, Germany More information about this series at http://www.springer.com/series/7409 Prasenjit Majumder Mandar Mitra (cid:129) Parth Mehta Jainisha Sankhavara (Eds.) (cid:129) Text Processing FIRE 2016 International Workshop – Kolkata, India, December 7 10, 2016 Revised Selected Papers 123 Editors Prasenjit Majumder ParthMehta DAIICT DAIICT Gujarat Gujarat India India MandarMitra Jainisha Sankhavara Indian Statistical Institute DAIICT Kolkata Gujarat India India ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notesin Computer Science ISBN 978-3-319-73605-1 ISBN978-3-319-73606-8 (eBook) https://doi.org/10.1007/978-3-319-73606-8 LibraryofCongressControlNumber:2017963769 LNCSSublibrary:SL3–InformationSystemsandApplications,incl.Internet/Web,andHCI ©SpringerInternationalPublishingAG2018 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartofthe material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodologynow knownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbookare believedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsortheeditors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictionalclaimsin publishedmapsandinstitutionalaffiliations. Printedonacid-freepaper ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerInternationalPublishingAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Preface Wearehappytopresentthecurrenteditedvolumeon“AdvancesinTextProcessing.” This volume comprises 16 papers from the seven tracks offered at FIRE 2016: Con- sumer Health Information Search (CHIS), Detecting Paraphrases in Indian Languages (DPIL), Information Extraction from Microblogs Posted During Disasters, Persian Plagiarism Detection (PersianPlagDet), Personality Recognition in Source Code (PR-SOCO), Shared Task on Mixed Script Information Retrieval (MSIR), and Shared Task on Code Mix Entity Extraction in Indian Languages (CMEE-IL). Persianlanguagetextprocessingwasasignificantinclusionthisyear,inadditionto the major Indian languages. FIRE was enriched by including an annotated Persian corpora and five papers. The next major contribution in this volume is on the DPIL task. Data on four Indian languages were presented – Tamil, Malayalam, Hindi, and Punjabi–andfourpapersontheselanguagesareincludedinthisvolume.Weincluded two papers on CHIS and MSIR each. Moreover, included one paper for each of the following tracks: the PR-SOCO, CMEE-IL, and Microblog track. We invited top teams to submit their papers for this book and received 19 papers. All submissions underwent a review process after which 16 papers were selected for inclusion.Wewouldliketothankthetrackorganizers andreviewersforhelpingusin the selection process with their insightful opinion and feedback. November 2017 Prasenjit Majumder Mandar Mitra Parth Mehta Jainisha Sankhavara Organization FIRE 2016, the Forum for Information Retrieval Evaluation, was organized by the Indian Statistical Institute, Kolkata. General Chairs Mandar Mitra Indian Statistical Institute Kolkata, India Prasenjit Majumder Dhirubhai Ambani Institute of ICT, India Program Committee Habibollah Asghari ICT research Institute, Iran Somnath Banerjee Jadavpur University, India Debasis Ganguly Dublin City University, Ireland Saptarshi Ghosh Indian Institute of Technology Kharagpur, India Kripabandhu Ghosh Indian Institute of Technology Kanpur, India Mark Michael Hall Edge Hill University, UK Ben Heuwing Universität Hildesheim, Germany Gareth Jones Dublin City University, Ireland Anand Kumar M. Amrita Vishwa Vidyapeetham, India Sobha Lalitha AU-KBC Research Centre, India Johannes Leveling Elsevier, Germany Mihai Lupu Vienna University of Technology, Austria Debapriyo Majumdar Indian Statistical Institute Kolkata, India Thomas Mandl Universität Hildesheim, Germany Parth Mehta Dhirubhai Ambani Institute of ICT, India Henning Müller University of Applied Sciences Western Switzerland, Switzerland Jiaul Paik Indian Institute of Technology Kharagpur, India Girish Palshikar Tata Research Development and Design Centre, India Swapan K. Parui Indian Statistical Institute Kolkata, India Paulo Quaresma Universidade de Evora, Portugal Nitin Ramrakhiyani Tata Research Development and Design Centre, India Pattabhi Rao AU-KBC Research Centre, India Paolo Rosso Universitat Politècnica de València, Spain Rishiraj Saha Roy Max Planck Institute for Informatics, Germany Jainisha Sankhavara Dhirubhai Ambani Institute of ICT, India Manjira Sinha Conduent Labs, India Jerome White New York University, Abu Dhabi David Zellhoefer Berlin State Library, Germany Contents PAN@FIRE: Overview of the PR-SOCO Track on Personality Recognition in SOurce COde . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Francisco Rangel, Fabio González, Felipe Restrepo, Manuel Montes, and Paolo Rosso Microblog Retrieval During Disasters: Comparative Evaluation of IR Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Moumita Basu, Kripabandhu Ghosh, Somenath Das, Somprakash Bandyopadhyay, and Saptarshi Ghosh Overview of the Mixed Script Information Retrieval (MSIR) at FIRE-2016 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Somnath Banerjee, Kunal Chakma, Sudip Kumar Naskar, Amitava Das, Paolo Rosso, Sivaji Bandyopadhyay, and Monojit Choudhury From Vector Space Models to Vector Space Models of Semantics . . . . . . . . 50 H. B. Barathi Ganesh, M. Anand Kumar, and K. P. Soman Algorithms and Corpora for Persian Plagiarism Detection: Overview of PAN at FIRE 2016. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Habibollah Asghari, Salar Mohtaj, Omid Fatemi, Heshaam Faili, Paolo Rosso, and Martin Potthast Predicting Type of Obfuscation to Enhance Text Alignment Algorithms . . . . 80 Fatemeh Mashhadirajab and Mehrnoush Shamsfard A Fast Multi-level Plagiarism Detection Method Based on Document Embedding Representation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Erfaneh Gharavi, Hadi Veisi, Kayvan Bijari, and Kiarash Zahirnia Plagiarism Detection Based on a Novel Trie-Based Approach. . . . . . . . . . . . 109 Alireza Talebpour, Mohammad Shirzadi Laskoukelayeh, and Zahra Aminolroaya Using Local Text Similarity in Pairwise Document Analysis for Monolingual Plagiarism Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 Nava Ehsan and Azadeh Shakery Shared Task on Detecting Paraphrases in Indian Languages (DPIL): An Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 M. Anand Kumar, Shivkaran Singh, B. Kavirajan, and K. P. Soman VIII Contents Anuj@DPIL-FIRE2016: A Novel Paraphrase Detection Method in Hindi Language Using Machine Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Anuj Saini and Aayushi Verma Learning to Detect Paraphrases in Indian Languages . . . . . . . . . . . . . . . . . . 153 Kamal Sarkar Sentence Paraphrase Detection Using Classification Models. . . . . . . . . . . . . 166 Liuyang Tian, Hui Ning, Leilei Kong, Kaisheng Chen, Haoliang Qi, and Zhongyuan Han Feature Engineering and Characterization of Classifiers for Consumer Health Information Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 D. Thenmozhi, P. Mirunalini, and Chandrabose Aravindan Identification of Relevance and Support for Consumer Health Information. . . 197 Suresh Kumar Sanampudi and Naveen Kumar Laskari Entity Extraction of Hindi-English and Tamil-English Code-Mixed Social Media Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 G. Remmiya Devi, P. V. Veena, M. Anand Kumar, and K. P. Soman Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 PAN@FIRE: Overview of the PR-SOCO Track on Personality Recognition in SOurce COde B Francisco Rangel1( ), Fabio Gonza´lez2, Felipe Restrepo2, Manuel Montes3, and Paolo Rosso4 1 Autoritas Consulting, Valencia, Spain [email protected] 2 MindLab Research Group, Universidad Nacional de Colombia, Bogot´a, Colombia {fagonzalezo,ferestrepoca}@unal.edu.co 3 INAOE, Puebla, Mexico [email protected] 4 PRHLT Research Center, Universitat Polit`ecnica de Val`encia, Valencia, Spain [email protected] Abstract. Author profiling consists of predicting an author’s demo- graphics(e.g.age,gender,personality)fromherwriting.Afteraddressing at PAN@CLEF mainly age and gender identification, and also person- ality recognition in Twitter (http://pan.webis.de/), in this PAN@FIRE track on Personality Recognition from SOurce COde (PR-SOCO) we have addressed the problem of predicting an author’s personality from hersourcecode.Inthispaper,weanalyse48runssentby11participants. GivenasetofsourcecodeswritteninJavabystudentswhoansweredalso apersonalitytest,participantshadtopredictbigfivetraits.Resultshave been evaluated with two complementary measures (RMSE and Pearson product-moment correlation) that have allowed to identify whether sys- tems with low error rates may work due to random chance. No matter theapproach,opennessisthetraitthatallowedtoobtainthebestresults for both measures. · · Keywords: Personality recognition Source code Author profiling 1 Introduction Personality influence most, if not all, of the human activities, such as the way people write [6,25], interact with others, and the way people make decisions, for instance in the case of developers the criteria they consider when selecting a software project they want to participate [22], or the way they write and structure their source code. Personality is defined along five traits using the Big Five Theory [7], which is the most widely accepted in psychology. The five traits are: extroversion (E), emotional stability/neuroticism (S), agreeableness (A), conscientiousness (C), and openness to experience (O). Personality recognition may have several practical applications, for example to set up high performance teams. In software development, not only technical (cid:2)c SpringerInternationalPublishingAG2018 P.Majumderetal.(Eds.):FIRE2016Workshop,LNCS10478,pp.1–19,2018. https://doi.org/10.1007/978-3-319-73606-8_1

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.