SentimentAnalysis MiningOpinions,Sentiments,andEmotions Sentimentanalysisisthecomputationalstudyofpeople’sopinions,sentiments,emo- tions,andattitudes.Thisfascinatingproblemisincreasinglyimportantinbusinessand society.Itoffersnumerousresearchchallengesbutpromisesinsightusefultoanyone interestedinopinionanalysisandsocialmediaanalysis. Thisbookgivesacomprehensiveintroductiontothetopicfromaprimarilynatural languageprocessingpointofviewtohelpreadersunderstandtheunderlyingstructure oftheproblemandthelanguageconstructsthatarecommonlyusedtoexpressopinions andsentiments.Itcoversallcoreareasofsentimentanalysis;includesmanyemerging themes, such as debate analysis, intention mining, and fake-opinion detection; and presents computational methods to analyze and summarize opinions. It will be a valuable resource for researchers and practitioners in natural language processing, computerscience,managementsciences,andthesocialsciences. BingLiuisaprofessorofcomputerscienceattheUniversityofIllinoisatChicago.His currentresearchinterestsincludesentimentanalysisandopinionmining,datamining, machine learning, and natural language processing. He has published extensively in topconferencesandjournals,andhisresearchhasbeencitedonthefrontpageofthe NewYorkTimes.Heisalsotheauthoroftwobooks:SentimentAnalysisandOpinion Mining (2012) and Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (first edition, 2007; secondedition, 2011).He currently serves asthe Chair of ACMSIGKDDandisanIEEEFellow. Sentiment Analysis Mining Opinions, Sentiments, and Emotions BING LIU UniversityofIllinoisatChicago 32AvenueoftheAmericas,NewYork,NY10013-2473,USA CambridgeUniversityPressispartoftheUniversityofCambridge. ItfurtherstheUniversity’smissionbydisseminatingknowledgeinthepursuitof education,learning,andresearchatthehighestinternationallevelsofexcellence. www.cambridge.org Informationonthistitle:www.cambridge.org/9781107017894 (cid:2)C BingLiu2015 Thispublicationisincopyright.Subjecttostatutoryexception andtotheprovisionsofrelevantcollectivelicensingagreements, noreproductionofanypartmaytakeplacewithoutthewritten permissionofCambridgeUniversityPress. Firstpublished2015 PrintedintheUnitedStatesofAmerica AcatalogrecordforthispublicationisavailablefromtheBritishLibrary. LibraryofCongressCataloginginPublicationData Liu,Bing,1963– Sentimentanalysis:miningopinions,sentiments,andemotions/BingLiu. pages cm Includesbibliographicalreferencesandindex. ISBN978-1-107-01789-4(hardback) 1.Naturallanguageprocessing(Computerscience) 2.Computationallinguistics. 3.Public opinion–Dataprocessing. 4.Datamining. I.Title. QA76.9.N38L58 2015 006.3ʹ12–dc23 2014036113 ISBN978-1-107-01789-4Hardback CambridgeUniversityPresshasnoresponsibilityforthepersistenceoraccuracyofURLsfor externalorthird-partyInternetwebsitesreferredtointhispublicationanddoesnotguaranteethat anycontentonsuchwebsitesis,orwillremain,accurateorappropriate. Contents Preface page xi Acknowledgments xv 1 Introduction 1 1.1 SentimentAnalysisApplications 4 1.2 SentimentAnalysisResearch 8 1.2.1 DifferentLevelsofAnalysis 9 1.2.2 SentimentLexiconandItsIssues 10 1.2.3 AnalyzingDebatesandComments 11 1.2.4 MiningIntentions 12 1.2.5 OpinionSpamDetectionandQualityofReviews 12 1.3 SentimentAnalysisasMiniNLP 14 1.4 MyApproachtoWritingThisBook 14 2 TheProblemofSentimentAnalysis 16 2.1 DefinitionofOpinion 17 2.1.1 OpinionDefinition 17 2.1.2 SentimentTarget 19 2.1.3 SentimentofOpinion 20 2.1.4 OpinionDefinitionSimplified 22 2.1.5 ReasonandQualifierforOpinion 24 2.1.6 ObjectiveandTasksofSentimentAnalysis 25 2.2 DefinitionofOpinionSummary 29 2.3 Affect,Emotion,andMood 31 2.3.1 Affect,Emotion,andMoodinPsychology 31 2.3.2 Affect,Emotion,andMoodinSentimentAnalysis 36 2.4 DifferentTypesofOpinions 39 2.4.1 RegularandComparativeOpinions 39 2.4.2 SubjectiveandFact-ImpliedOpinions 40 2.4.3 First-PersonandNon-First-PersonOpinions 44 v vi Contents 2.4.4 Meta-Opinions 44 2.5 AuthorandReaderStandpoint 45 2.6 Summary 45 3 DocumentSentimentClassification 47 3.1 SupervisedSentimentClassification 49 3.1.1 ClassificationUsingMachineLearningAlgorithms 49 3.1.2 ClassificationUsingaCustomScoreFunction 56 3.2 UnsupervisedSentimentClassification 57 3.2.1 ClassificationUsingSyntacticPatternsandWebSearch 57 3.2.2 ClassificationUsingSentimentLexicons 59 3.3 SentimentRatingPrediction 61 3.4 Cross-DomainSentimentClassification 63 3.5 Cross-LanguageSentimentClassification 65 3.6 EmotionClassificationofDocuments 67 3.7 Summary 68 4 SentenceSubjectivityandSentimentClassification 70 4.1 Subjectivity 72 4.2 SentenceSubjectivityClassification 73 4.3 SentenceSentimentClassification 76 4.3.1 AssumptionofSentenceSentimentClassification 77 4.3.2 ClassificationMethods 78 4.4 DealingwithConditionalSentences 80 4.5 DealingwithSarcasticSentences 82 4.6 Cross-LanguageSubjectivityandSentimentClassification 84 4.7 UsingDiscourseInformationforSentimentClassification 86 4.8 EmotionClassificationofSentences 87 4.9 Discussion 88 5 AspectSentimentClassification 90 5.1 AspectSentimentClassification 91 5.1.1 SupervisedLearning 92 5.1.2 Lexicon-BasedApproach 93 5.1.3 ProsandConsoftheTwoApproaches 96 5.2 RulesofSentimentComposition 98 5.2.1 SentimentCompositionRules 99 5.2.2 DECREASEandINCREASEExpressions 106 5.2.3 SMALL_OR_LESSandLARGE_OR_MOREExpressions 109 5.2.4 EmotionandSentimentIntensity 112 5.2.5 SensesofSentimentWords 112 5.2.6 SurveyofOtherApproaches 114 5.3 NegationandSentiment 116 5.3.1 NegationWords 116 5.3.2 Never 119 Contents vii 5.3.3 SomeOtherCommonSentimentShifters 121 5.3.4 ShiftedorTransferredNegations 122 5.3.5 ScopeofNegations 122 5.4 ModalityandSentiment 123 5.5 CoordinatingConjunctionBut 127 5.6 SentimentWordsinNon-opinionContexts 129 5.7 RuleRepresentation 131 5.8 WordSenseDisambiguationandCoreferenceResolution 133 5.9 Summary 135 6 AspectandEntityExtraction 137 6.1 Frequency-BasedAspectExtraction 138 6.2 ExploitingSyntacticRelations 140 6.2.1 UsingOpinionandTargetRelations 141 6.2.2 UsingPart-ofandAttribute-ofRelations 147 6.3 UsingSupervisedLearning 149 6.3.1 HiddenMarkovModels 150 6.3.2 ConditionalRandomFields 151 6.4 MappingImplicitAspects 153 6.4.1 Corpus-BasedApproach 153 6.4.2 Dictionary-BasedApproach 154 6.5 GroupingAspectsintoCategories 157 6.6 ExploitingTopicModels 159 6.6.1 LatentDirichletAllocation 160 6.6.2 UsingUnsupervisedTopicModels 163 6.6.3 UsingPriorDomainKnowledgeinModeling 168 6.6.4 LifelongTopicModels:LearnasHumansDo 171 6.6.5 UsingPhrasesasTopicalTerms 174 6.7 EntityExtractionandResolution 179 6.7.1 ProblemofEntityExtractionandResolution 179 6.7.2 EntityExtraction 183 6.7.3 EntityLinking 184 6.7.4 EntitySearchandLinking 185 6.8 OpinionHolderandTimeExtraction 186 6.9 Summary 187 7 SentimentLexiconGeneration 189 7.1 Dictionary-BasedApproach 190 7.2 Corpus-BasedApproach 193 7.2.1 IdentifyingSentimentWordsfromaCorpus 194 7.2.2 DealingwithContext-DependentSentimentWords 195 7.2.3 LexiconAdaptation 197 7.2.4 SomeOtherRelatedWork 198 7.3 DesirableandUndesirableFacts 199 7.4 Summary 200 viii Contents 8 AnalysisofComparativeOpinions 202 8.1 ProblemDefinition 202 8.2 IdentifyComparativeSentences 206 8.3 IdentifyingthePreferredEntitySet 207 8.4 SpecialTypesofComparison 209 8.4.1 NonstandardComparison 209 8.4.2 Cross-TypeComparison 211 8.4.3 Single-EntityComparison 212 8.4.4 SentencesInvolvingCompareandComparison 214 8.5 EntityandAspectExtraction 215 8.6 Summary 216 9 OpinionSummarizationandSearch 218 9.1 Aspect-BasedOpinionSummarization 219 9.2 EnhancementstoAspect-BasedSummary 221 9.3 ContrastiveViewSummarization 224 9.4 TraditionalSummarization 225 9.5 SummarizationofComparativeOpinions 225 9.6 OpinionSearch 226 9.7 ExistingOpinionRetrievalTechniques 227 9.8 Summary 229 10 AnalysisofDebatesandComments 231 10.1 RecognizingStancesinDebates 232 10.2 ModelingDebates/Discussions 235 10.2.1 JTEModel 236 10.2.2 JTE-RModel:EncodingReplyRelations 240 10.2.3 JTE-PModel:EncodingPairStructures 243 10.2.4 AnalysisofToleranceinOnlineDiscussions 245 10.3 ModelingComments 246 10.4 Summary 248 11 MiningIntentions 250 11.1 ProblemofIntentionMining 250 11.2 IntentionClassification 254 11.3 Fine-GrainedMiningofIntentions 256 11.4 Summary 258 12 DetectingFakeorDeceptiveOpinions 259 12.1 DifferentTypesofSpam 262 12.1.1 HarmfulFakeReviews 262 12.1.2 TypesofSpammersandSpamming 263 12.1.3 TypesofData,Features,andDetection 265 12.1.4 FakeReviewsversusConventionalLies 267 12.2 SupervisedFakeReviewDetection 269 12.3 SupervisedYelpDataExperiment 272 Contents ix 12.3.1 SupervisedLearningUsingLinguisticFeatures 273 12.3.2 SupervisedLearningUsingBahavioralFeatures 274 12.4 AutomatedDiscoveryofAbnormalPatterns 275 12.4.1 ClassAssociationRules 276 12.4.2 UnexpectednessofOne-ConditionRules 277 12.4.3 UnexpectednessofTwo-ConditionRules 280 12.5 Model-BasedBehavioralAnalysis 282 12.5.1 SpamDetectionBasedonAtypicalBehaviors 282 12.5.2 SpamDetectionUsingReviewGraph 283 12.5.3 SpamDetectionUsingBayesianModels 284 12.6 GroupSpamDetection 285 12.6.1 GroupBehaviorFeatures 288 12.6.2 IndividualMemberBehaviorFeatures 290 12.7 IdentifyingReviewerswithMultipleUserids 291 12.7.1 LearninginaSimilaritySpace 292 12.7.2 TrainingDataPreparation 293 12.7.3 d-Featuresands-Features 294 12.7.4 IdentifyingUseridsoftheSameAuthor 295 12.8 ExploitingBurstinessinReviews 298 12.9 SomeFutureResearchDirections 300 12.10 Summary 301 13 QualityofReviews 303 13.1 QualityPredictionasaRegressionProblem 303 13.2 OtherMethods 305 13.3 SomeNewFrontiers 306 13.4 Summary 307 14 Conclusions 309 Appendix 315 Bibliography 327 Index 363
Description: