ebook img

Sentic Computing: Techniques, Tools, and Applications PDF

165 Pages·2012·2.985 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Sentic Computing: Techniques, Tools, and Applications

SpringerBriefs in Cognitive Computation For furthervolumes: http://www.springer.com/series/10374 Erik Cambria Amir Hussain • Sentic Computing Techniques, Tools, and Applications 123 ErikCambria AmirHussain Media Laboratory Department of ComputingScience Massachusetts Instituteof Technology Universityof Stirling Cambridge Stirling USA UK ISSN 2212-6023 ISSN 2212-6031 (electronic) ISBN 978-94-007-5069-2 ISBN 978-94-007-5070-8 (eBook) DOI 10.1007/978-94-007-5070-8 SpringerDordrechtHeidelbergNewYorkLondon LibraryofCongressControlNumber:2012942029 (cid:2)TheAuthor(s)2012 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionor informationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purposeofbeingenteredandexecutedonacomputersystem,forexclusiveusebythepurchaserofthe work. Duplication of this publication or parts thereof is permitted only under the provisions of theCopyrightLawofthePublisher’slocation,initscurrentversion,andpermissionforusemustalways beobtainedfromSpringer.PermissionsforusemaybeobtainedthroughRightsLinkattheCopyright ClearanceCenter.ViolationsareliabletoprosecutionundertherespectiveCopyrightLaw. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexempt fromtherelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. While the advice and information in this book are believed to be true and accurate at the date of publication,neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityfor anyerrorsoromissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,with respecttothematerialcontainedherein. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) To Laura, For believing in me and fostering my passions and dreams, always Foreword When the authors asked me to write the foreword to this book, I readily accepted sinceitisasubjectthatintriguesmeandoneforwhichIrealizedthatIhavemuch to learn. It was only as I got deeper into this work that I realized what a seminal andimportantcontributiontoinformationtechnologythisbookhasthepotentialto be.ThisbookwilllikelybecomeaclassicinthefieldofSenticComputingandbe adopted as a textbook for students of the subject. It provides not only a compre- hensive overview of the foundational elements of the topic, but introduces new methods along with teaching and research challenges for the future. There is no doubt of the impact that the World Wide Web has had on human communication and discourse and will continue to have in the unforeseeable future.Asitscommunicationscapabilitiesgrow,itsskillsmustalsogrow.Idonot meantheskillsofWebusers,buttheskillsoftheWebitself.ThefutureWebwill not just tie together documents, data, and people, it will also become a vastly connected world of processes, objects, and things all of which will have unique identitiesandpersonalities.LikethehumanswhousetheWeb,theseobjectsmust have common sense, be affective, and perhaps even emotional. To be effective, theymustbeabletomakedecisionsandjudgementssomeofwhicharelikelytobe quitesubtle.Suchqualitiesoftheseobjectswillbeembedded,hidden,anddefined within the data used by these entities. This data will become the DNA that prescribesthebehavioroftheseobjects.Likeitsbiologicalcounterpart,thisDNA willcontain theprogramming instructions for data manipulationgoverning object actions. But what will be the source of this data? Web technology wasrevolutionized whenusersdiscoveredthe abilitytouse it as a platform for expressing opinions and evaluations. The previously static Web became a dynamic corpus of user-generated informationsuch asproduct reviews, statisticalpollingdata,surveydata,etc.Withinthisinformation,justwaitingtobe uncovered,isawealthofconceptual,expressive,andemotionalcontentreflecting user responses to natural language questions and options. Unfortunately, the majorityofthiscontentisunstructuredtextcontainingalloftheambiguitiesfound inspokencommunications.Thechallengebecamehowtoconvertthisinformation into structured data formats required for efficient and accurate computer analysis. vii viii Foreword It was soon realized that intelligent mining of this data was a difficult task requiring the definition of new paradigms and the development of innovative analysistools.Forexample,isitpossibletoperformclause-levelsemanticanalysis of essentially freeform textual data which accurately allows the inference of both conceptual and emotional information? Employing methods from artificial intel- ligence, data mining, and such diverse disciplines as behavioral and sentiment analysis, the authors of this book explain why the answer is yes. The result is Sentic Computing. Potential applications of Sentic Computing are widespread. Analyzed data drawn from product ratings and reviews are a marketer’s dream and provide the newcurrencyfore-commercewebsitesandsocialmedia.Theauthorsdescribehow Sentic Computing techniques can influence HCI (Human-Computer Interaction) andEHealth.Readersofthebookwilllikelyrealizeotherpossibleapplicationsof the methods andtoolsina range of otherdisciplines. I encourage the reader to carefully study this book because doing so will provide a stimulating introduction to an area of study with great future potential. Itwillchangeyourmindaboutthevalue(andthepotentialusage)oftheopinions and evaluations that you make everyday. It did so for me. Stanford, 9th April 2012 Bebo White Preface The ways people express their opinions and sentiments have radically changed in thepastfewyearsthankstotheadventofsocialnetworks,Webcommunities,blogs, wikis,andotheronlinecollaborativemedia.Thedistillationofknowledgefromthis huge amount of unstructured information can be a key factor for marketers who wanttocreateanimageoridentityinthemindsoftheircustomersfortheirproduct, brand,ororganisation.Theseonlinesocialdata,however,remainhardlyaccessible tocomputers,astheyarespecificallymeantforhumanconsumption.Theautomatic analysis of online opinions, in fact, involves a deep understanding of natural language text bymachines, fromwhichwe are still very far. Hitherto, online information retrieval has been mainly based on algorithms relyingonthetextualrepresentationofWebpages.Suchalgorithmsareverygood at retrieving texts, splitting them into parts, checking the spelling, and counting theirwords.Butwhenitcomestointerpretingsentencesandextractingmeaningful information,theircapabilitiesareknowntobeverylimited.Existingapproachesto opinion mining and sentiment analysis, in particular, can be grouped into three maincategories:keywordspotting,inwhichtextisclassifiedintocategoriesbased onthepresenceoffairlyunambiguousaffectwords;lexicalaffinity,whichassigns arbitrarywordsaprobabilisticaffinityforaparticularemotion;statisticalmethods, which calculate the valence of affective keywords and word co-occurrence frequencies on the base of a large training corpus. Earlyworksaimedtoclassifyentiredocumentsascontainingoverallpositiveor negativepolarity,orratingscoresofreviews.Suchsystemsweremainlybasedon supervised approaches relying on manually labeled samples, such as movie or product reviews where the opinionist’s overall positive or negative attitude was explicitly indicated. However, opinions and sentiments do not occur only at document level, nor are they limited to a single valence or target. Contrary or complementary attitudes toward the same topic or multiple topics can be present acrossthespanofadocument.Inmorerecentworks,textanalysisgranularityhas beentakendowntosegmentandsentencelevel,e.g.,byusingpresenceofopinion- bearing lexical items (single words or n-grams) to detect subjective sentences, or by exploiting association rule mining for a feature-based analysis of product ix x Preface reviews. These approaches, however, are still far from being able to infer the cognitive and affective information associated with natural language as they mainlyrelyonknowledgebasesthatarestilltoolimitedtoefficientlyprocesstext at sentence level. In this book, common sense computing techniques are further developed and applied to bridge the semantic gap between word-level natural languagedataandtheconcept-levelopinionsconveyedbythese.Inparticular,the ensemble application of graph mining and multi-dimensionality reduction tech- niques on two common sense knowledge bases was exploited to develop a novel intelligent engine for opendomain opinion mining and sentiment analysis. The proposed approach, termed sentic computing, performs a clause-level semantic analysis of text, which allows the inference of both the conceptual and emotional informationassociatedwithnaturallanguageopinionsand,hence,amoreefficient passage from (unstructured) textual information to (structured) machine-process- able data. The engine was tested on three different resources, namely a Twitter hashtag repository, a LiveJournal database, and a PatientOpinion dataset, and its perfor- mance compared both with results obtained using standard sentiment analysis techniquesandusingdifferentstate-of-the-artknowledgebasessuchasPrinceton’s WordNet, MIT’s ConceptNet, and Microsoft’s Probase. Differently from most currentlyavailableopinionminingservices,thedevelopedenginedoesnotbaseits analysis on a limited set of affect words and their co-occurrence frequencies, but rather on common sense concepts and the cognitive and affective valence con- veyedbythese.Thisallowstheenginetobedomain-independentand,hence,tobe embedded in any opinion mining system for the development of intelligent applications in multiple fields such as Social Web, HCI, and e-health. Looking ahead,thecombinednoveluseofdifferentknowledgebasesandofcommonsense reasoning techniques for opinion mining proposed in this work, will, eventually, pave the way for development of more bio-inspired approaches to the design of natural language processing systems capable of handling knowledge, retrieving it when necessary, making analogies, and learning from experience. Acknowledgments Thecompletionofthisbookwouldnothavebeenpossiblewithoutthecontribution of many individuals, to whom I express my appreciation and gratitude. Because thecontentsofthisbookaremainlydrawn frommydoctoral researchwork,Iam first of all deeply indebted to my Ph.D. supervisors, namely: Amir Hussain, who guided me every step of the way and was an immense source of inspiration throughout;CatherineHavasi,whoseguidance,encouragement,andsupportinthe past three years has been simply invaluable; and Chris Eckl, who helped me to look at my research from many different points of view. I am also grateful to all the other mentors and colleagues who supported me duringmyscientificmissionsandinternships,inparticular,RobertSpeer,Kenneth Arnold, Dustin Smith, Jason Alonso, and Henry Lieberman, who assisted me in gettingstarted,using,andfurtherdevelopingcommonsensecomputingtools(both during my research visits at MIT Media Lab and remotely); Campbell Grant, for his critical opinions and sound advice about the commercial aspects of my researchwork;AnnaEsposito,forherunwaveringsupportwithintheCOST2102 programme; Thomas Mazzocco and Marco Grassi, for their invaluable research contributions; Evan Magill and Kang Li, for their insights and assistance in refining the contents of this book; Joseph Lyons and Carmen Tropeano, for their helpindesigningsomeofthegraphics;andJamesMunro,forthesupportandthe data provided within my work in the field of patient opinion mining. Special thanks also go to Praphul Chandra and Sudhir Dixit, who helped me expandthehorizonsofmyresearchduringmyinternshipatHewlett-PackardLabs India; Tariq Durrani, Cheng-Lin Liu, Tieniu Tan, and Derong Liu, for their gui- dance during my research visit at the National Laboratory of Pattern Recognition (NLPR) in the Institute of Automation of the Chinese Academy of Sciences; and HaixunWangandYangqiuSong,forhelpingmeimprovemyskillsandexpertise in the field of knowledge-based systems during my internship at Microsoft Research Asia. A last, but not least, acknowledgement goes to my family and all the old and new friends who, in the past three years, have cheered me up in difficultmoments,celebratedwithmeformyachievements,andneverblamedme for being often far away from them. xi xii Acknowledgments My Ph.D. was one of the best experiences of my life as it gave me the possi- bilityto:workshouldertoshoulderwithscientistsfromtopresearchinstitutes,get in touch with both far West and far East cultures, meet special people who will always be part of my life, and even exchange ideas with beautiful minds such as TimBerners-Lee,theinventoroftheWeb,BeboWhite,WebpioneerintheUnited States, and Marvin Minsky, one of the fathers of AI. Boston, 9th May 2012 Erik Cambria

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.