ebook img

Natural Language Processing and Information Systems: 14th International Conference on Applications of Natural Language to Information Systems, NLDB 2009, Saarbrücken, Germany, June 24-26, 2009. Revised Papers PDF

334 Pages·2010·8.807 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Natural Language Processing and Information Systems: 14th International Conference on Applications of Natural Language to Information Systems, NLDB 2009, Saarbrücken, Germany, June 24-26, 2009. Revised Papers

Lecture Notes in Computer Science 5723 CommencedPublicationin1973 FoundingandFormerSeriesEditors: GerhardGoos,JurisHartmanis,andJanvanLeeuwen EditorialBoard DavidHutchison LancasterUniversity,UK TakeoKanade CarnegieMellonUniversity,Pittsburgh,PA,USA JosefKittler UniversityofSurrey,Guildford,UK JonM.Kleinberg CornellUniversity,Ithaca,NY,USA AlfredKobsa UniversityofCalifornia,Irvine,CA,USA FriedemannMattern ETHZurich,Switzerland JohnC.Mitchell StanfordUniversity,CA,USA MoniNaor WeizmannInstituteofScience,Rehovot,Israel OscarNierstrasz UniversityofBern,Switzerland C.PanduRangan IndianInstituteofTechnology,Madras,India BernhardSteffen TUDortmundUniversity,Germany MadhuSudan MicrosoftResearch,Cambridge,MA,USA DemetriTerzopoulos UniversityofCalifornia,LosAngeles,CA,USA DougTygar UniversityofCalifornia,Berkeley,CA,USA GerhardWeikum Max-PlanckInstituteofComputerScience,Saarbruecken,Germany Helmut Horacek Elisabeth Métais Rafael Muñoz Magdalena Wolska (Eds.) Natural Language Processing and Information Systems 14th International Conference onApplications ofNaturalLanguagetoInformationSystems,NLDB2009 Saarbrücken, Germany, June 24-26, 2009 Revised Papers 1 3 VolumeEditors HelmutHoracek SaarlandUniversity,Dept.ofComputerScience P.O.Box151150,66041Saarbrücken,Germany E-mail:[email protected] ElisabethMétais CNAM-LaboratoireCédric 292RueSt.Martin,75141ParisCedex03,France E-mail:[email protected] RafaelMuñoz UniversidaddeAlicante,DepartamentodeLenguajesySistemasInformáticos CampusdeSanVicentedelRaspeig,Apdo99,03080Alicante,Spain E-mail:[email protected] MagdalenaWolska SaarlandUniversity,Dept.ofGeneralLinguistics P.O.Box151150,66041Saarbrücken,Germany E-mail:[email protected] LibraryofCongressControlNumber:2010924365 CRSubjectClassification(1998):H.2.8,H.2,H.3,I.2,F.3-4,H.4,C.2 LNCSSublibrary:SL3–InformationSystemsandApplication,incl.Internet/Web andHCI ISSN 0302-9743 ISBN-10 3-642-12549-2SpringerBerlinHeidelbergNewYork ISBN-13 978-3-642-12549-2SpringerBerlinHeidelbergNewYork Thisworkissubjecttocopyright.Allrightsarereserved,whetherthewholeorpartofthematerialis concerned,specificallytherightsoftranslation,reprinting,re-useofillustrations,recitation,broadcasting, reproductiononmicrofilmsorinanyotherway,andstorageindatabanks.Duplicationofthispublication orpartsthereofispermittedonlyundertheprovisionsoftheGermanCopyrightLawofSeptember9,1965, initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer.Violationsareliable toprosecutionundertheGermanCopyrightLaw. springer.com ©Springer-VerlagBerlinHeidelberg2010 PrintedinGermany Typesetting:Camera-readybyauthor,dataconversionbyScientificPublishingServices,Chennai,India Printedonacid-freepaper 06/3180 Preface This volume contains the papers presented at NLDB 2009, the 14th Interna- tional Conference on Applications of Natural Language to Information Systems held June 24–26, 2009, at the University of the Saarland and the German Re- searchCenterforArtificialIntelligenceinSaarbru¨cken,Germany.Inadditionto reviewed submissions, the program also included contributions to the doctoral symposiumheldduringNLDB2009aswellastwoinvitedtalks.Thesetalkscov- eredsomeofthecurrentlyhottopicsintheuseofnaturallanguageforaccessing information systems. Wereceived51submissionsasregularpapersforthemainconference,2extra submissions as posters, and 3 short papers for the doctoral symposium. Each paper for the main conference was assigned four reviewers, taking into account preferencesexpressedbythe ProgramCommittee membersasmuchaspossible. Within the review deadline, we received at least three reviews for almost all submissions. After the review deadline, the Conference Organizing Committee members and the ProgramCommittee Chair acted as meta-reviewers.This task included studying the reviews and the papers, specifically those whose assessment made them borderline cases, and discussing conflicting opinions and their impact on theassessmentofindividualpapers.Finally,themeta-reviewerswroteadditional reviews for the few papers which received less than three reviews, as well as for papers which received reviews with considerably conflicting assessments. In order to come up with a final decision, the meta-reviewersused a ranking list according to the weighted average scores of all papers on a scale from -3 (lowest possible) to +3 (highest possible), the reviewer’s confidence being used as the weighting factor. Submissions with a score greater or equal to 1.0 (weak accept)wereacceptedasfullpapers.Forthis threshold,weusedthe unweighted average score, which makes the arrival at the precise value used conceptually motivated. In general, the differences between weighted and unweighted scores were mostly marginal; almost all submissions accepted as full papers were at least 1.0 also on weighted average score (only two scored 0.9 on this score). As short papers, we accepted submissions with scores lower than full papers, which got either three positive scores or two positive scores but no negative ones. As posters, finally, we accepted most submissions which were assessed at least slightly positive, individually deciding some borderline cases. The final acceptance rate counting the number of full papers according to NLDB tradition was 25.5% percent (13 out of 51), quite close to the rates in at least the previous four years. In addition, eight submissions were accepted as short papers, and seven as posters, including one of the extra submissions, a system demo. Originally, two more posters were accepted, but the authors VI Preface preferredtowithdrawtheirsubmissions,oncetheywereonlyacceptedasposters. Finally, the three submissions to the doctoral symposium were all accepted. Since the short papers were not assessed much lower than some of the full papers, we also chose a relatively small difference between the space devoted to these categories: full papers were allowed a maximum of 15, and short papers a maximum of 12 pages, while posters got only 2 pages. There was, however, a more pronounced difference between full and short papers in terms of the presentation time. Inthis volume,mostcontributionsweregroupedaccordingtotheir category. First, there are the two invited papers; then the majority of the contributions, long and short papers together. These papers were grouped according to topic areas. This is followed a section with the posters. Finally, there are the three doctoral symposium papers. The invited papers both address learning of specific natural language con- cepts from large corpora. The paper by Hovy investigates contributions to the long-range issue of building a database out of the information found on the Web. This includes automated instances mining, metastructure harvesting,and inter-concept relation discovery. The paper by Uszkoreit et al. reports on a sys- tematicanalysisofaminimallysupervisedmachinelearningmethodforrelation extraction grammars. These investigations led to insights on the dependencies of properties of the data and the selection of the seeds of the performance of the algorithm. Consequently, the learning method can be further improved by taking these dependencies into account. The accepted contributions (long and short papers) covered a wide range of topics, which we classified into eight topic areas, each covering a section in this volume: – Information Retrieval – Term Extraction – Information Extraction – Classification of Text – Classification of Documents – Interfaces to Knowledge Bases – Using Semantic Models of Natural Language – Quality Assessment of Knowledge Sources Information Retrieval.Twofullpaperswerecategorizedinthissection.They emphasizethe roleofsemanticinformationtoimprovesomespecific formsofin- formation retrieval. The first paper by Navarro et al. introduces evidence from naturallanguagein multimodal fusion techniques.Throughincorporatingnatu- ral language annotations related to images, the performance of two widely used fusion strategies in visual information retrieval can be significantly improved. The second paper by Sorg and Cimiano applies the model of explicit semantic analysis to cross-language retrieval. The authors perform a systematic inves- tigation through examining variations between different basic design choices, yielding considerable improvements over the original model. Preface VII Term Extraction. One full and two short papers were categorizedin this sec- tion.They addressa varietyofapplicationsoftermextractiontechniques,rang- ing from topic maps over schema matching to requirements analysis, mostly by combining evidence from several knowledge sources. The first one, the full pa- per by Ellouze etal.,proposesanincrementalconstructionof multilingualtopic maps. In this construction process, it takes into account multiple knowledge sources including, as a particular factor, evidence about requests of potential users. The first of the two short papers by Coen and Xue addresses the prob- lemofschema-matching,thatis,semanticequivalencesacrossnamespaces.The algorithmpresented makes use of textual information and dependency informa- tion. The second of the two short papers by Kof analyzes concept extraction techniques to be used in the construction of executable models from textual specifications. The paper obtains the first results in this new research direction through systematic comparisons in a case study. InformationExtraction.Onefullandoneshortpaperwerecategorizedinthis section. They address issues of information extraction in non-standard applica- tions. The full paper by Segura-Bedmar et al. investigates anaphora resolution for the yet untested domain of drug interaction. They achieve results similar to those obtained for other domains by using domain-specific syntactic and se- mantic parsers. The short paper by Had et al. investigates standard relation extraction techniques accommodated to the German language. Through an en- hancedcompositekernelmethod,theyimprovesignificantlyoverthepoorresults ofstandardmethods,whicharenotwellsuitedtofreewordorderlanguages,such as German. Classification of Text. Two full and one short paper were categorized in this section. They deal with a nice selection of special, rarely addressed topics. The first full paper by Asonov aims at effective spelling correction. By simplifying the taskinto onlyfinding typographicalerrorsratherthanalsocorrectingthem, a task which falls under the responsibility of the human writer anyway,Asonov achieves improved results over other approaches. The other full paper by Reyes etal.aims atrecognizinghumor,a taskin whichcomputersarenotoriouslybad performers.Theauthorsdemonstratethatanexaminationoffeaturesofsemantic andmorpho-syntacticambiguitiesenablesarathergooddiscriminationbetween humor and non-humor. The short paper by Balahur and Montoyo addresses opinion mining. They apply a general annotation scheme that is suitable for a variety of domains, with quite promising results. Classification of Documents. Two full and one shortpaper were categorized in this section. They cover quite a large range of techniques. The first full pa- per by Keim et al. aims at the extraction of discriminating and overlap terms out of a set of document classes. Their widely language-independent method is shown to outperform several competing approaches. The other full paper by Ponomareva et al. describes a fully implemented system for archiving institu- tional repositories.The semi-automatic approachcombines automatic discovery and extraction methods with user interaction techniques to ensure the quality VIII Preface of the bibliographic data obtained. The short paper by He and Lin proposes a semi-supervised learning algorithm via local learning with class priors to ad- dresstextclassificationforthe domainofprotein–propteininteractions.Theau- thorsdemonstratethattheir compoundalgorithmoutperformsmoretraditional methods. Interfaces to Knowledge Bases. Two full and one short paper were catego- rized in this section. The full papers address traditional interfaces to databases, the short one aims at an interface applicable to languages that are based on images. The first full paper by Cimiano and Minock investigates a quantitative grounding in empirical data. The evidence obtained from a geodatabase makes it clear that the demand on interfaces is quite high, since these systems must have some means to deal with a variety of natural language phenomena each of which appears with some frequency in the corpus. The other full paper by Giordani and Moschitti attacks the problem of mapping natural language ques- tions onto SQL queries via learning techniques, on the basis of a corpus of pairs of natural language questions and database queries. Like the previous paper, they also use a geodatabase as their domain of application. The short paper by Sidorov et al. reports on pioneer work for a Mayan script database, the first computer-basedrepresentationfor this antique sign language.The incorporated methodsarewidelyapplicabletolanguagesbasedonsetsofimages,andtheim- plementedsystemcanaccommodateademandofuserswithvaryingbackground knowledge. Using Semantic Models of Natural Language. Two full papers were cat- egorized in this section. They both aim at the attainment of increased quality throughthe incorporationof semantics,for different applicationareas.The first onebyLlorensetal.investigatestheincorporationofsemanticrolesinthetaskof temporal expression identification. The authors demonstrate that the approach is quite valuable, especially since it is less demanding in terms of training data anddevelopmenttime,whencomparedtomachinelearningandknowledge-based approaches, respectively. The other full paper by Adly and Al Ansary presents an interlingua-based machine translation system, evaluated for English-Arabic translation.Theelaborateevaluationsinseveralautomatedmetricsdemonstrate that this system performs better than some well-knowncompetitors in English- Arabic translation. Quality Assessment of Knowledge Sources.One full andtwoshortpapers were categorized in this section. They address properties of ontologies and cor- pora as their target of quality evaluation. The full paper by Solskinnsbakk et al. investigates issues in verifying the quality of subsumption hierarchies. The authors formulate hypotheses about relations between classes in super/sub and sister relations,verifying these hypotheses for some widely used ontologies.The first short paper by Sabou et al. investigates the problem of evaluating the cor- rectnessofsemanticrelations.TheyuseonlineontologiesandtheSemanticWeb in order to test the plausibility of a semantic relation. The other short paper by Pinto et al. investigates the quality assessment of text corpora, on the basis Preface IX of corpus features,suchas domainbroadnessand class imbalance.The formally obtainedqualityassessmentsareshowntobecomparabletohumanjudgements. The methods presented couldbe used, for example,to assessthe quality of gold standards. The posters included in this volume cover a wide range of topics, including navigationalsemantics,wordlevel alignment, data mart schema design, spread- sheet information retrieval, weblog corpora, full text search, and knowledge management. Finally, the papers from the doctoral symposium address textual entailment, spreadsheet information retrieval, and speech interpretation. The conference organizers are indebted to the reviewers for their engage- ment in a vigorous submission evaluation process. We would also thank mem- bersoftheDFKIGmbH,andourvariousstudenthelpers,fortheirhelpwiththe organization. June 2009 Helmut Horacek Elisabeth M´etais Rafael Mun˜oz Magdalena Wolska Organization Conference Organization Helmut Horacek Saarland University, Saarbru¨cken,Germany Elisabeth M´etais CNAM, Paris, France Reind van de Riet (+) Vrije Universiteit Amsterdam, The Netherlands Program Chair Rafael Mun˜oz Universidad de Alicante, Spain Doctoral Symposium Chair Magdalena Wolska Saarland University, Saarbru¨cken,Germany Publicity Chair Mokrane Bouzeghoub Universit´e de Versailles, France Program Committee Jacky Akoka CNAM, France Sophia Ananiadou Manchester Interdisciplinary Biocentre, UK Frederic Andres University of Advanced Studies, Japan Jing Bai Yahoo Inc., Canada Akhilesh Bajaj University of Tulsa, USA Mokrane Bouzeghoub Universit´e de Versailles, France Hiram Calvo National Polytechnic Institute, Mexico Roger Chiang University of Cincinnati, USA Philip Cimiano University of Delft, The Netherlands Isabelle Comyn-Wattiau CNAM, France Antje Du¨sterho¨ft Hochschule Wismar, Germany Gu¨nther Fliedl University of Klagenfurt, Austria Alexander Gelbukh Mexican Academy of Sciences, Mexico Jon Atle Gulla NorwegianUniversity of Science and Technology, Norway Udo Hahn Friedrich-Schiller-Universita¨tJena, Germany Karin Harbusch Universita¨t Koblenz-Landau, Germany Harmain Harmain United Arab Emirates University, UAE Alexander Hinneburg University of Halle, Germany

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.