ebook img

Text Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining PDF

531 Pages·2016·27.22 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Text Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining

Z H Text Data Management and Analysis A I • Text Data A Practical Introduction to Information Retrieval and Text Mining M A ChengXiang Zhai and Sean Massung S S Management U N Recent years have seen a dramatic growth of natural language text data, including web pages, G news articles, scientific literature, emails, enterprise documents, and social media such as blog articles, forum posts, product reviews, and tweets. This has led to an increasing demand and Analysis for powerful software tools to help people manage and analyze vast amounts of text data ef- T e fectively and efficiently. Unlike data generated by a computer system or sensors, text data are x usually generated directly by humans, and capture semantically rich content. As such, text t data are especially valuable for discovering knowledge about human opinions and preferenc- D A Practical Introduction es, in addition to many other kinds of knowledge that we encode in text. In contrast to struc- a t tured data, which conform to well-defined schemas (thus are relatively easy for computers to a to Information Retrieval handle), text has less explicit structure, requiring computer processing toward understanding M of the content encoded in text. The current technology of natural language processing has a and Text Mining not yet reached a point to enable a computer to precisely understand natural language text, n but a wide range of statistical and heuristic approaches to management and analysis of text a g data have been developed over the past few decades. They are usually very robust and can be e applied to analyze and manage text data in any natural language, and about any topic. m This book provides a systematic introduction to many of these approaches, with an em- e n phasis on covering the most useful knowledge and skills required to build a variety of prac- t tically useful text information systems. Because humans can understand natural languages a far better than computers can, effective involvement of humans in a text information system n is generally needed and text information systems often serve as intelligent assistants for hu- d mans. Depending on how a text information system collaborates with humans, we distinguish A two kinds of text information systems. The first is information retrieval systems which include n a search engines and recommender systems; they assist users in finding from a large collection l of text data the most relevant text data that are actually needed for solving a specific applica- y s tion problem, thus effectively turning big raw text data into much smaller relevant text data i s that can be more easily processed by humans. The second is text mining application systems; they can assist users in analyzing patterns in text data to extract and discover useful action- able knowledge directly useful for task completion or decision making, thus providing more ChengXiang Zhai direct task support for users. A C M Sean Massung | ABOUT ACM BOOKS M O R ACM Books is a new series of high quality books for G M the computer science community, published by ACM A &Cin collaboration with Morgan & Claypool Publishers. N & ACM Books publications are widely distributed in C both print and digital formats through booksellers L M and to libraries (and library consortia) and individual ACM members via the ACM ISBN: 978-1-970009-010160-07 AY &C Digital Library platform. P O O BOOKS.ACM.ORG • WWW.MORGANCLAYPOOL.COM 9 781970 001167 L Text Data Management and Analysis ACM Books EditorinChief M.TamerO¨zsu,UniversityofWaterloo ACMBooksisanewseriesofhigh-qualitybooksforthecomputersciencecommunity, published by ACM in collaboration with Morgan & Claypool Publishers. ACM Books publicationsarewidelydistributedinbothprintanddigitalformatsthroughbooksellers andtolibraries(andlibraryconsortia)andindividualACMmembersviatheACMDigital Libraryplatform. Text Data Management and Analysis: A Practical Introduction to Information RetrievalandTextMining ChengXiangZhai,UniversityofIllinoisatUrbana–Champaign SeanMassung,UniversityofIllinoisatUrbana–Champaign 2016 AnArchitectureforFastandGeneralDataProcessingonLargeClusters MateiZaharia,MassachusettsInstituteofTechnology 2016 ReactiveInternetProgramming:StateChartXMLinAction FranckBarbier,UniversityofPau,France 2016 VerifiedFunctionalProgramminginAgda AaronStump,TheUniversityofIowa 2016 TheVRBook:Human-CenteredDesignforVirtualReality JasonJerald,NextGenInteractions 2016 Ada’sLegacy:CulturesofComputingfromtheVictoriantotheDigitalAge RobinHammerman,StevensInstituteofTechnology AndrewL.Russell,StevensInstituteofTechnology 2016 EdmundBerkeleyandtheSocialResponsibilityofComputerProfessionals BernadetteLongo,NewJerseyInstituteofTechnology 2015 CandidateMultilinearMaps SanjamGarg,UniversityofCalifornia,Berkeley 2015 SmarterthanTheirMachines:OralHistoriesofPioneersinInteractiveComputing John Cullinane, Northeastern University; Mossavar-Rahmani Center for Business andGovernment,JohnF.KennedySchoolofGovernment,HarvardUniversity 2015 AFrameworkforScientificDiscoverythroughVideoGames SethCooper,UniversityofWashington 2014 Trust Extension as a Mechanism for Secure Code Execution on Commodity Computers BryanJeffreyParno,MicrosoftResearch 2014 EmbracingInterferenceinWirelessSystems ShyamnathGollakota,UniversityofWashington 2014 Text Data Management and Analysis A Practical Introduction to Information Retrieval and Text Mining ChengXiang Zhai UniversityofIllinoisatUrbana–Champaign Sean Massung UniversityofIllinoisatUrbana–Champaign ACMBooks#12 Copyright©2016bytheAssociationforComputingMachinery andMorgan&ClaypoolPublishers Allrightsreserved.Nopartofthispublicationmaybereproduced, storedinaretrieval system,ortransmittedinanyformorbyanymeans—electronic,mechanical,photocopy, recording,oranyotherexceptforbriefquotationsinprintedreviews—withouttheprior permissionofthepublisher. Designations used by companies to distinguish their products are often claimed as trademarksorregisteredtrademarks.InallinstancesinwhichMorgan&Claypoolisaware ofaclaim,theproductnamesappearininitialcapitalorallcapitalletters.Readers,however, should contact the appropriate companies for more complete information regarding trademarksandregistration. TextDataManagementandAnalysis ChengXiangZhaiandSeanMassung books.acm.org www.morganclaypoolpublishers.com ISBN:978-1-97000-119-8 hardcover ISBN:978-1-97000-116-7 paperback ISBN:978-1-97000-117-4 ebook ISBN:978-1-97000-118-1 ePub SeriesISSN: 2374-6769print 2374-6777electronic DOIs: 10.1145/2915031 Book 10.1145/2915031.2915044 Chapter12 10.1145/2915031.2915032 Preface 10.1145/2915031.2915045 Chapter13 10.1145/2915031.2915033 Chapter1 10.1145/2915031.2915046 Chapter14 10.1145/2915031.2915034 Chapter2 10.1145/2915031.2915047 Chapter15 10.1145/2915031.2915035 Chapter3 10.1145/2915031.2915048 Chapter16 10.1145/2915031.2915036 Chapter4 10.1145/2915031.2915049 Chapter17 10.1145/2915031.2915037 Chapter5 10.1145/2915031.2915050 Chapter18 10.1145/2915031.2915038 Chapter6 10.1145/2915031.2915051 Chapter19 10.1145/2915031.2915039 Chapter7 10.1145/2915031.2915052 Chapter20 10.1145/2915031.2915040 Chapter8 10.1145/2915031.2915053 Appendices 10.1145/2915031.2915041 Chapter9 10.1145/2915031.2915054 References 10.1145/2915031.2915042 Chapter10 10.1145/2915031.2915055 Index 10.1145/2915031.2915043 Chapter11 ApublicationintheACMBooksseries,#12 EditorinChief:M.TamerO¨zsu,UniversityofWaterloo AreaEditor:EdwardA.Fox,VirginiaTech FirstEdition 10 9 8 7 6 5 4 3 2 1 ToMeiandAlex ToKai Contents Preface xv Acknowledgments xviii PARTI OVERVIEW AND BACKGROUND 1 Chapter1 Introduction 3 1.1 FunctionsofTextInformationSystems 7 1.2 ConceptualFrameworkforTextInformationSystems 10 1.3 OrganizationoftheBook 13 1.4 HowtoUsethisBook 15 BibliographicNotesandFurtherReading 18 Chapter2 Background 21 2.1 BasicsofProbabilityandStatistics 21 2.2 InformationTheory 31 2.3 MachineLearning 34 BibliographicNotesandFurtherReading 36 Exercises 37 Chapter3 TextDataUnderstanding 39 3.1 HistoryandStateoftheArtinNLP 42 3.2 NLPandTextInformationSystems 43 3.3 TextRepresentation 46 3.4 StatisticalLanguageModels 50 BibliographicNotesandFurtherReading 54 Exercises 55

Description:
Recent years have seen a dramatic growth of natural language text data, including web pages, news articles, scientific literature, emails, enterprise documents, and social media (such as blog articles, forum posts, product reviews, and tweets). This has led to an increasing demand for powerful softw
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.