ebook img

Natural language processing for online applications: text retrieval, extraction and categorization PDF

237 Pages·2002·1.283 MB·Natural Language Planning
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Natural language processing for online applications: text retrieval, extraction and categorization

NaturalLanguageProcessingforOnlineApplications Natural Language Processing Editor Prof.RuslanMitkov SchoolofHumanities,LanguagesandSocialSciences UniversityofWolverhampton StaffordSt. WolverhamptonWV11SB,UnitedKingdom Email:[email protected] AdvisoryBoard ChristianBoitet(UniversityofGrenoble) JohnCarroll(UniversityofSussex,Brighton) EugeneCharniak(BrownUniversity,Providence) EduardHovy(InformationSciencesInstitute,USC) RichardKittredge(UniversityofMontreal) GeoffreyLeech(LancasterUniversity) CarlosMartin-Vide(RoviraiVirgiliUn.,Tarragona) AndreiMikheev(UniversityofEdinburgh) JohnNerbonne(UniversityofGroningen) NicolasNicolov(IBM,T.J.WatsonResearchCenter) KemalOflazer(SabanciUniversity) AllanRamsey(UMIST,Manchester) MoniqueRolbert(UniversitédeMarseille) RichardSproat(AT&TLabs Research,FlorhamPark) Keh-YihSu(BehaviourDesignCorp.) IsabelleTrancoso(INESC,Lisbon) BenjaminTsou(CityUniversityofHongKong) Jun-ichiTsujii(UniversityofTokyo) EvelyneTzoukermann(BellLaboratories,MurrayHill) YorickWilks(UniversityofSheffield) Volume5 Natural Language Processing for Online Applications: Text Retrieval, ExtractionandCategorization byPeterJacksonandIsabelleMoulinier Natural Language Processing for Online Applications Text Retrieval, Extraction and Categorization Peter Jackson Isabelle Moulinier ThomsonLegal&Regulatory JohnBenjaminsPublishingCompany Amsterdam / Philadelphia TM ThepaperusedinthispublicationmeetstheminimumrequirementsofAmerican 8 NationalStandardforInformationSciences–PermanenceofPaperforPrinted LibraryMaterials,ansiz39.48-1984. LibraryofCongressCataloging-in-PublicationData Jackson,Peter,1948- Natural language processing for online applications : text retrieval, extraction, and categorization/PeterJackson,IsabelleMoulinier. p. cm.(NaturalLanguageProcessing,issn1567–8202;v.5) Includesbibliographicalreferencesandindex. I.Jackson,Peter.II.Moulinier,Isabelle.III.Title.IV.Series. QA76.9.N38 I33 2002 006.3’5--dc21 2002066539 isbn902724988(cid:2)1(Eur.)/158811249(cid:2)7(US)(Hb;alk.paper) isbn902724989(cid:2)X(Eur.)/158811250(cid:2)0(US)(Pb;alk.paper) ©2002–JohnBenjaminsB.V. Nopartofthisbookmaybereproducedinanyform,byprint,photoprint,microfilm,orany othermeans,withoutwrittenpermissionfromthepublisher. JohnBenjaminsPublishingCo.·P.O.Box36224·1020meAmsterdam·TheNetherlands JohnBenjaminsNorthAmerica·P.O.Box27519·Philadelphiapa19118-0519·usa Table of contents Preface  C1 Naturallanguageprocessing  . WhatisNLP?  . NLPandlinguistics  .. Syntaxandsemantics  .. Pragmaticsandcontext  .. TwoviewsofNLP  .. Tasksandsupertasks  . Linguistictools  .. Sentencedelimitersandtokenizers  .. Stemmersandtaggers  .. Nounphraseandnamerecognizers  .. Parsersandgrammars  . Planofthebook  C2 Documentretrieval  . Informationretrieval  . Indexingtechnology  . Queryprocessing  .. Booleansearch  .. Rankedretrieval  .. Probabilisticretrieval  .. Languagemodeling  . Evaluatingsearchengines  .. Evaluationstudies  .. Evaluationmetrics  .. Relevancejudgments  .. Totalsystemevaluation  . Attemptstoenhancesearchperformance   Tableofcontents .. Queryexpansionandthesauri  .. Queryexpansionfromrelevanceinformation*  . ThefutureofWebsearching  .. IndexingtheWeb  .. SearchingtheWeb  .. Rankingandrerankingdocuments  .. Thestateofonlinesearch  . Summaryofinformationretrieval  C3 Informationextraction  . TheMessageUnderstandingConferences  . Regularexpressions  . FiniteautomatainFASTUS  .. FiniteStateMachinesandregularlanguages  .. FiniteStateMachinesasparsers  . Pushdownautomataandcontext-freegrammars  .. Analyzingcasereports  .. Contextfreegrammars  .. Parsingwithapushdownautomaton  .. Copingwithincompletenessandambiguity  . Limitationsofcurrenttechnologyandfutureresearch  .. Explicitversusimplicitstatements  .. Machinelearningforinformationextraction  .. Statisticallanguagemodelsforinformationextraction  . Summaryofinformationextraction  C4 Textcategorization  . Overviewofcategorizationtasksandmethods  . Handcraftedrulebasedmethods  . Inductivelearningfortextclassification  .. NaïveBayesclassifiers  .. Linearclassifiers*  .. Decisiontreesanddecisionlists  . NearestNeighboralgorithms  . Combiningclassifiers  .. Datafusion  .. Boosting  Tableofcontents  .. Usingmultipleclassifiers  . Evaluationoftextcategorizationsystems  .. Evaluationstudies  .. Evaluationmetrics  .. Relevancejudgments  .. Systemevaluation  C5 Towardstextmining  . Whatistextmining?  . Referenceandcoreference  .. Namedentityrecognition  .. Thecoreferencetask  . Automaticsummarization  .. Summarizationtasks  .. Constructingsummariesfromdocumentfragments  .. Multi-documentsummarization(MDS)  . Testingofautomaticsummarizationprograms  .. Evaluationproblemsinsummarizationresearch  .. Buildingacorpusfortrainingandtesting  . ProspectsfortextminingandNLP  Index  Preface Thereisnosingletextonthemarketthatcoverstheemergingtechnologiesof documentretrieval,informationextraction,andtextcategorizationinacoher- entfashion.Thisbookseekstosatisfyagenuineneedonthepartoftechnology practitionersintheInternetspace,whoarefacedwithhavingtomakedifficult decisionsas to what research has been done,and what the bestpractices are. It is not intendedas a vendorguide (such things are quicklyout of date), or asarecipeforbuildingapplications(suchrecipesareverycontext-dependent). Butitdoesidentifythekeytechnologies,theissuesinvolved,andthestrengths andweaknessesofthevariousapproaches.Thereisalsoastrongemphasison evaluationin everychapter, both in termsof methodology(how to evaluate) andwhatcontrolledexperimentationandindustrialexperiencehavetotellus. Iwaspromptedtowrite thisbook afterspendingsevenyearsrunningan R&DgroupinanInternetpublishingandsolutionsbusiness.Duringthattime, wewereabletoputintoproductionanumberofsystemsthateithergenerated revenueorenabledcostsavingsforthecompany,leveragingtechnologiesfrom informationretrieval,informationextraction,andtextcategorization.Thisis notachronicleoftheseexploits,butaprimerforthosewhoarealreadyinter- estedinnaturallanguageprocessingforonlineapplications.Nevertheless,my treatmentofthephilosophyandpracticeoflanguageprocessingiscoloredby thecontextinwhichIfunction,namelythearenaofcommercialexploitation. Thus, althoughthere isafocusontechnical detailandresearchresults,Ialso addresssomeoftheissuesthatariseinapplyingsuchsystemstodatacollections ofrealisticsizeandcomplexity. Thebook isnotintendedexclusivelyasan academictext, althoughIsus- pect that it will be of interestto studentswho wish to use these technologies in an industrial setting. It is also aimed at software engineers, project man- agers,andtechnologyexecutiveswhowantorneedtounderstandthetechnol- ogyatsomelevel.Ihope thatsuch people findituseful,andthatit provokes ideas,discussion,andactioninthefieldofappliedresearchanddevelopment. Eachchapterbeginswithlightermaterialandthenprogressestoheavierstuff, withsomeofthelatersectionsandsidebarsbeingmarkedwith anasteriskas

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.