ebook img

Modeling and data mining in blogosphere PDF

113 Pages·2009·3.216 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Modeling and data mining in blogosphere

Modeling and Data Mining in Blogosphere Synthesis Lectures on Data Mining and Knowledge Discovery Editor RobertGrossman,UniversityofIllinois,Chicago ModelingandDataMininginBlogosphere NitinAgarwalandHuanLiu 2009 Copyright© 2009byMorgan&Claypool Allrightsreserved.Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmittedin anyformorbyanymeans—electronic,mechanical,photocopy,recording,oranyotherexceptforbriefquotationsin printedreviews,withoutthepriorpermissionofthepublisher. ModelingandDataMininginBlogosphere NitinAgarwalandHuanLiu www.morganclaypool.com ISBN:9781598299083 paperback ISBN:9781598299090 ebook DOI10.2200/S00213ED1V01Y200907DMK001 APublicationintheMorgan&ClaypoolPublishersseries SYNTHESISLECTURESONDATAMININGANDKNOWLEDGEDISCOVERY Lecture#1 SeriesEditor:RobertGrossman,UniversityofIllinois,Chicago SeriesISSN SynthesisLecturesonDataMiningandKnowledgeDiscovery ISSNpending. Modeling and Data Mining in Blogosphere Nitin Agarwal UniversityofArkansasatLittleRock Huan Liu ArizonaStateUniversity SYNTHESISLECTURESONDATAMININGANDKNOWLEDGEDISCOVERY #1 M &C Morgan &cLaypool publishers ABSTRACT Thisbookoffersacomprehensiveoverviewofthevariousconceptsandresearchissuesaboutblogsor weblogs.It introduces techniques and approaches,tools and applications,and evaluation method- ologies with examples and case studies. Blogs allow people to express their thoughts, voice their opinions,andsharetheirexperiencesandideas.Blogsalsofacilitateinteractionsamongindividuals creating a network with unique characteristics.Through the interactions individuals experience a senseofcommunity.Weelaborateonapproachesthatextractcommunitiesandclusterblogsbased on information of the bloggers. Open standards and low barrier to publication in Blogosphere have transformed information consumers to producers, generating an overwhelming amount of ever-increasingknowledgeaboutthemembers,theirenvironmentandsymbiosis.Weelaborateon approaches that sift through humongous blog data sources to identify influential and trustworthy bloggersleveragingcontentandnetworkinformation.Spamblogsorsplogsisanincreasingconcern in Blogosphere, which is discussed in detail with the approaches leveraging supervised machine learning algorithms and interaction patterns.We elaborate on data collection procedures,provide resourcesforblogdatarepositories,mentionvariousvisualizationandanalysistoolsinBlogosphere, andexplainconventionalandnovelevaluationmethodologies,tohelpperformresearchintheBlo- gosphere. Thebookissupportedbyadditionalmaterial,includinglectureslidesaswellasthecomplete setoffiguresusedinthebook,andthereaderisencouragedtovisitthebookwebsiteforthelatest information: http://tinyurl.com/mcp-agarwal KEYWORDS blogosphere,weblogs,blogs,blog model,power law distribution,scale free networks, degree distribution,clustering coefficient,centrality measures,clustering,community discovery,influence,diffusion,trust,propagation,spam blogs,splogs,data collection, blogcrawling,performanceevaluation To my parents, Sushma and Umesh Chand Agarwal…–NA To my parents, wife, and sons…–HL …with much love and gratitude for everything. ix Contents Acknowledgments................................................................xi 1 ModelingBlogosphere............................................................1 1.1 ModelingEssentials ........................................................2 1.2 PreferentialAttachmentBlogModels ........................................ 8 1.2.1 Log-normalDistributionModels 12 2 BlogClusteringandCommunityDiscovery........................................15 2.1 GraphBasedApproach....................................................17 2.2 ContentBasedApproach...................................................21 2.3 HybridApproach..........................................................24 3 InfluenceandTrust..............................................................27 3.1 Influence .................................................................27 3.1.1 GraphBasedApproach 30 3.1.2 ContentBasedApproach 33 3.1.3 HybridApproach 34 3.1.4 BlogLeaders 40 3.2 Trust.....................................................................40 3.2.1 TrustComputation 41 3.2.2 TrustPropagation 43 4 SpamFilteringinBlogosphere....................................................45 4.1 GraphBasedApproach....................................................47 4.2 ContentBasedApproach...................................................49 4.3 HybridApproach..........................................................51 5 DataCollectionandEvaluation...................................................53 x CONTENTS 5.1 DataCollection...........................................................53 5.1.1 API 53 5.1.2 WebCrawler 56 5.1.3 AvailableDatasets 58 5.1.4 DataPreprocessing 59 5.2 Evaluation................................................................60 5.2.1 BlogModeling 61 5.2.2 BlogClusteringandCommunityDiscovery 61 5.2.3 InfluenceandTrust 64 5.2.4 Spam 68 A ToolsinBlogosphere.............................................................71 B APIExamples...................................................................79 Bibliography ....................................................................87 Biography.......................................................................95 Index...........................................................................97

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.