ebook img

The Naïve Bayes Model for Unsupervised Word Sense Disambiguation: Aspects Concerning Feature Selection PDF

78 Pages·2013·0.841 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview The Naïve Bayes Model for Unsupervised Word Sense Disambiguation: Aspects Concerning Feature Selection

SpringerBriefs in Statistics For furthervolumes: http://www.springer.com/series/8921 Florentina T. Hristea The Naïve Bayes Model for Unsupervised Word Sense Disambiguation Aspects Concerning Feature Selection 123 Florentina T.Hristea Faculty ofMathematics and Computer Science Department of Computer Science Universityof Bucharest Bucharest Romania ISSN 2191-544X ISSN 2191-5458 (electronic) ISBN 978-3-642-33692-8 ISBN 978-3-642-33693-5 (eBook) DOI 10.1007/978-3-642-33693-5 SpringerHeidelbergNewYorkDordrechtLondon LibraryofCongressControlNumber:2012949374 (cid:2)TheAuthor(s)2013 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionor informationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purposeofbeingenteredandexecutedonacomputersystem,forexclusiveusebythepurchaserofthe work. Duplication of this publication or parts thereof is permitted only under the provisions of theCopyrightLawofthePublisher’slocation,initscurrentversion,andpermissionforusemustalways beobtainedfromSpringer.PermissionsforusemaybeobtainedthroughRightsLinkattheCopyright ClearanceCenter.ViolationsareliabletoprosecutionundertherespectiveCopyrightLaw. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexempt fromtherelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. While the advice and information in this book are believed to be true and accurate at the date of publication,neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityfor anyerrorsoromissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,with respecttothematerialcontainedherein. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) To the memory of my father, Prof. Dr. Theodor Hristea, who has passed on to me his love for words Preface ThepresentworkconcentratesontheissueoffeatureselectionfortheNaïveBayes model with application in unsupervised word sense disambiguation (WSD). It examines the process offeature selection while referring to an unsupervised cor- pus-basedmethodforautomaticWSDthatreliesonthisspecificstatisticalmodel. It concentrates on a distributional approach to unsupervised WSD based on monolingual corpora, with focus on the usage of the Naïve Bayes model as clustering technique. While the Naïve Bayes model has been widely and successfully used in supervised WSD, its usage in unsupervised WSD has led to more modest dis- ambiguation results and is less frequent. One could, in fact, say that it has been entirelydropped.Thelatestandmostcomprehensivesurvey1onWSDreferstothe Naïve Bayes model strictly in conjunction with supervised WSD noting that ‘‘in spite of the independence assumption, the method compares well with other supervised methods’’ (Navigli 2009). It seems that the potential of this statistical model in unsupervised WSD continues to remain insufficiently explored. We feel that unsupervised WSD has not yet made full use of the Naïve Bayes model. ItisequallyourbeliefthattheNaïveBayesmodelneedstobefedknowledgein order to perform well as clustering technique for unsupervised WSD. This knowledge can be fed in various ways and can be of various natures. The present workstudiessuchknowledgeofcompletelydifferenttypesandhopestoinitiatean opendiscussionconcerning the nature ofthe knowledge thatis best suited for the NaïveBayesmodelwhenactingasclusteringtechnique.Threedifferentsourcesof such knowledge, which have been used only very recently in the literature (rela- tively to this specific clustering technique) are being examined and compared: WordNet, dependency relations, and web N-grams. This study ultimately con- centrates not on WSD (which is regarded as an application) but on the issue of feeding knowledge to the Naïve Bayes model for feature selection. 1 Navigli,R.:WordSenseDisambiguation:ASurvey.ACMComput.Surv.41(2),1–69(2009). vii viii Preface The present work represents a synthesis of 5 journal papers that have been authored or coauthored by us during the time interval 2008–2012, when our scientificinterestwasfullycapturedbytheissueoffeatureselectionfortheNaïve Bayes model. This research is hereby extended, with two important additional conclusionsbeingdrawninChaps.4and5.Eachchapterwillintroduceknowledge of a different type, that is to be fed to the Naïve Bayes model, indicating those words(features)thatshouldbepartoftheso-called‘‘disambiguationvocabulary’’ whentryingtodecreasethenumberofparametersforunsupervisedWSDbasedon this statistical model. ThisworkthereforeplacesWSDwithanunderlyingNaïveBayesmodelatthe border between unsupervised and knowledge-based techniques. It highlights the benefits offeeding knowledge (of various natures) to a knowledge-lean algorithm for unsupervised WSD that uses the Naïve Bayes model as clustering technique. Our study will show that a basic, simple knowledge-lean disambiguation algorithm, hereby represented by the Naïve Bayes model, can perform quite well when provided knowledgeinanappropriate way.It willequally justifyourbelief that the Naïve Bayes model still holds a promise for the open problem of unsu- pervised WSD. Toulouse, France, November 2011 Florentina T. Hristea Acknowledgments The author expresses her deepest gratitude to Professor Ted Pedersen for having providedthedatasetnecessaryforperformingthepresentedtestsandcomparisons with respect to adjectives and verbs. We are equally indebted to two anonymous refereesfortheirvaluablecommentsandsuggestions.Thisresearchwassupported by the National University Research Council of Romania (the ‘‘Ideas’’ research program, PN II—IDEI), Contract No. 659/2009. ix Contents 1 Preliminaries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Introduction: The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Word Sense Disambiguation (WSD) . . . . . . . . . . . . . . . . . . . . 4 1.3 Naïve Bayes-Based WSD at the Border Between Unsupervised and Knowledge-Based Techniques. . . . . . . . . . . . . . . . . . . . . . 5 1.3.1 Pedersen and Bruce Local-Type Features. . . . . . . . . . . . 6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 The Naïve Bayes Model in the Context of Word Sense Disambiguation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 The Probability Model of the Corpus and the Bayes Classifier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Parameter Estimation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3 Semantic WordNet-Based Feature Selection . . . . . . . . . . . . . . . . . 17 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 WordNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3 Making Use of WordNet for Feature Selection . . . . . . . . . . . . . 19 3.4 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.4.1 Design of the Experiments. . . . . . . . . . . . . . . . . . . . . . 21 3.4.2 Test Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4 Syntactic Dependency-Based Feature Selection . . . . . . . . . . . . . . . 35 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.2 A Dependency-Based Semantic Space for WSD with a Naïve Bayes Model. . . . . . . . . . . . . . . . . . . . . . . . . . . 37 xi xii Contents 4.2.1 Dependency-Based Feature Selection. . . . . . . . . . . . . . . 38 4.3 Empirical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.3.1 Design of the Experiments. . . . . . . . . . . . . . . . . . . . . . 40 4.3.2 Test Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5 N-Gram Features for Unsupervised WSD with an Underlying Naïve Bayes Model . . . . . . . . . . . . . . . . . . . . 55 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.2 The Web as a Corpus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.3 Experimental Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.3.1 Corpora. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.3.2 Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.3.3 Adding Knowledge from an External Knowledge Source . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Acronyms AI Artificial Intelligence DG Dependency Grammar LSA Latent Semantic Analysis LSI Latent Semantic Indexing NLP Natural Language Processing POS Part of Speech WN WordNet WSD Word Sense Disambiguation xiii

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.