ebook img

Attention Inequality in Social Media PDF

0.44 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Attention Inequality in Social Media

Attention Inequality in Social Media LinhongZhuandKristinaLerman USCInformationSciencesInstitute 4676AdmiraltyWay MarinaDelRey,CA90292 {linhong,lerman}@isi.edu Abstract KlamerandVanDalen2002).Researchershavelongspec- 6 ulated about the origins of inequalities. Current consen- 1 Socialmediacanbeviewedasasocialsystemwherethecur- sus holds that inequality cannot be explained by variation 0 rencyisattention.Peoplepostcontentandinteractwithoth- in quality of the work or individual talent (Adler 1985; 2 erstoattractattentionandgainnewfollowers.Inthispaper, Salganik, Dodds, and Watts 2006; Lariviere and Gingras n weexaminethedistributionofattentionacrossalargesam- 2010). Rather, it is the result of the underlying social pro- a pleofusersofapopularsocialmediasiteTwitter.Through cesses,suchasthe“cumulativeadvantage”orthe“rich-get- J empiricalanalysisofthesedataweconcludethatattentionis veryunequallydistributed:thetop20%ofTwitterusersown richer”effect,whichbringsgreaterrecognitiontothosewho 6 morethan96%ofallfollowers,93%oftheretweets,and93% arealreadydistinguished(Merton1968;Allison,Long,and 2 ofthementions.Weinvestigatethemechanismsthatleadto Kraze1982). ] attention inequality and find that it results from the “rich- Inonlinesocialmediaattentionisalsonon-uniformlydis- I get-richer”and“poor-get-poorer”dynamicsofattentiondif- tributed.Someuser-generatedphotosandvideosareviewed, S fusion. Namely, users who are “rich” in attention, because liked, or shared orders of magnitude more times than the s. they are often mentioned and retweeted, are more likely to rest. Psy’s Gangnam Style video, for example, was viewed c gainnewfollowers,whilethosewhoare“poor”inattention over 2 billion times on YouTube, while the average video [ arelikelytolosefollowers.Wedevelopaphenomenological modelthatquantifiesattentiondiffusionandnetworkdynam- hasfewerthan100thousandviews.Attentionpaidtopeople 1 ics,andsolveittostudyhowattentioninequalitygrowsover isalsohighlyunequal.AfewTwitterusershavetensofmil- v timeinadynamicenvironmentofsocialmedia. lionsoffollowers,whilethemajorityhavejustahandfulof 0 followers.Thesepopularusersenjoyanunparalleledadvan- 0 tage:thingstheysayonTwitter,fromproductendorsements 2 Introduction topersonalopinions,areseenbymillions,makingthempo- 7 tentiallyfarmoreinfluentialthanordinaryusers. 0 Inequality is a pervasive social phenomenon. For example, 1. incomeandwealtharenotevenlydistributedinmodernso- In this paper, we carry out quantitative analysis of data 0 ciety,butinstead,concentratedamongfewindividuals.The from the microblogging service Twitter to characterize and 6 richest 20% in the United States own more than 80% of quantify attention inequality in social media. We identify a 1 the country’s total wealth, while the poorest 20% own just set of almost 6,000 users and monitor their activity over a : afractionofonepercent(NortonandAriely2011).Inequal- periodofseveralmonths.Weshowthatattentionisveryun- v ity does not just violate our sense of fairness (Norton and equallydistributed:thetop20%ofTwitterusersinoursam- i X Ariely 2011), but it also reduces economic opportunities, ple own 97% of all followers, 93% of retweets, and 93% r and has been linked to negative social outcomes, such as of mentions. By observing how the monitored users allo- a poor health (Kawachi and Kennedy 1999) and high crime cate their attention over time, via following, mentioning or rates(Kelly2000)insocietieswithhighlevelsofinequality. retweeting other users, we are able to study the processes While economic inequality has been widely discussed re- leadingtoattentioninequality.Weempiricallydemonstrate cently,inequalitiesarerampantinotherdomains,including that the attention Twitter users receive by being mentioned science and entertainment, where a few individuals receive or having their posts retweeted by others results in them disproportionateshareofattentionandthebenefitsitbrings. gainingnewfollowers.Incontrast,userswithfewerfollow- In science, specifically, attention can be measured by the ersareretweetedlessfrequentlyandaremorelikelytolose number of citations that papers (and scientists who publish followers.Weconstructaphenomenologicalmodelofatten- them)receive.Thedistributionofthenumberofcitationsis tiondiffusionwherethesetwoprocesses—mentioningand extremelyunequal,withsomepapersreceivingthousandsof retweeting — result in changes in the structure of the un- citations,whilemanyothersareseldomcited(Allison1980; derlyingfollowernetwork.Afterparameterizingthemodel, wesolveittostudyhowattentionandinequalityevolveover Copyright(cid:13)c 2015,AssociationfortheAdvancementofArtificial time.Ourmodelproducesa“rich-get-richer”and“poor-get- Intelligence(www.aaai.org).Allrightsreserved. poorer” dynamic, whereby users with more followers, i.e., Twitter celebrities, are mentioned and retweeted more fre- quently,whichresultsinthemgainingevenmorefollowers. ncy10−2 ncy10−2 e e Our model is able to identify users who will gain more at- equ10−5 equ10−5 tention,topredicttheevolutionofthefollowernetworkand d fr d fr e e attentioninequality. aliz10−10 Seed users aliz10−10 Seed users There are consequences to high attention inequality, few orm All users orm All users of which have been explored in the context of social me- N 102 104 106107 N 101 103 105 107 dia.Highinequalityfocusestheattentionoftheonlinecom- Number of followers Number of friends munityonrelativelyfewusers,allowingthemtodispropor- (a) (b) tionately benefit from the attention they receive. Compa- niescompensatesocialmediacelebritiesforendorsingtheir ncy10−2 ncy10−2 potaonorftophfotosoedb.trphueAuecpinlrtteastgoceraunpwcnrtleierdaoeltemncanbhrceeireisisedtnssyt.ao,hTgotietehrbcsiusew.usiOrfi,anrtingrecndinhetriceieqnny.tuawgWraoyelafihetuesyvtseonieddrcraei,ismavlarlevinernmidcydiesoeeoihndovteei,heaseiinttsrahomnrveedsiedutd,waicesvnhoeiesmesrpttsiiahailnytarayger-t Normalized freque11110000−−−−18640 10SA1ell eu1ds0 eu2rsse1r0s3 104 105 Normalized freque11110000−−−−18640 101SAell eu1ds0 eu2rssers103 104 105 Number of status Number of retweets ofcontentandviewpointstowhichpeopleareexposed.But (c) (d) inequalitymayalsohavehiddenbenefits.Wheneverysingle personhadwatchedthesamevideo,orreadthesamestory, Figure 1: Distribution of the numbers of (a) followers, (b) itgivespeopleacommontopicforconversation(andtweet- friends,(c)statusupdatesposted,and(d)retweetsreceived ing)andacommonvocabularywithwhichtodiscuss.Thus, forseedusersandallusers.Welogarithmicallybindatafor inequalityreinforcessocialidentity. smoothing. Ourworkraisesanumberofquestions,includingwhydo userstolerateextremeattentioninequality?Howdoesitaf- fect user behavior? Does it reduce their engagement with edges. To provide more details about statistics of our data, Twitter, or does it inspire them to be more active in gain- we analyze the distribution of the numbers of followers, ingattention?Andnotleast,shouldTwitter,andothersocial friends,statusupdates(useractivity),andthetimesthatthe mediasites,dosomethingtoensureattentionismoreequi- tweets of the seed users, and all users in our data, were tablydistributed?Wehopethatourworkstimulatesresearch retweeted. Note that we use ‘status updates’ to refer to the communitytoaddressthesequestions. aggregatetweetspostedbyusersinceaccountcreation,and we use ‘tweets’ to refer to the tweets user posted during QuantifyingAttentionInequality a specific time period. These distributions, plotted in Fig- In this section we quantitatively characterize the attention ure 1, have a characteristic long-tailed form, where a few receivedbyusersandcontenttheypostonTwitter.Although usershaveextremelylargenumbersoffriends,followersor we focus on Twitter, our definitions of attention generalize postedtweets,whilemanyusershavefewfriends,followers toothersocialmediaplatforms. orpostedtweets.Suchdistributionsaregenerallyassociated withhighinequality. DataDescription Exceptforactivity(numberofpostedtweets)andnumber We collected data from Twitter over a period from March ofretweets,thedistributionsforseedandallusersarevery 2014 to Oct 2014 using the following strategies. First, we similar,whichindicatessuccessofourrandomseeduserse- randomly selected 5600 seed users whose user ids range lectionstrategy.Thesmalldifferencesbetweenthedistribu- from[0,760000000].WethenusedtheTwitterAPItoperi- tion of number of tweets among seed users and among all odicallyquerythefriendsofseedusers(i.e.,whomtheseed users arise because the profile information of seed users is userswerefollowing),whichresultedinadynamicnetwork obtained from raw Jason objects of tweets, and thus users where each edge had a time stamp with its creation and/or whoneverpostanytweetscannotbeselectedasseedusers deletiondate.Next,wecollectedprofileinformationforall inourdatacollection.Duetothesedifferences,thedistribu- the seed users and their friends, as well as timeline tweets. tion of number of retweets among seed users, also slightly The number of tweets gives that user’s activity, or engage- differsfromthatamongallusers. ment, level. We tracked the retweet/reply/mention status of eachtweetfromitsrawJasonobjectreturnedbyTwitter.The AttentionInequality number of retweets gives the number of times users have sharedaparticulartweetwiththeirownfollowers.Aswear- How much attention do Twitter users receive? Certainly, it gue later, this is an important measure of attention. For the makes sense to use the number of followers as a measure seed users, we further monitored their temporal profiles by of attention: the more followers you have, the more peo- querying user profile weekly during the data collection pe- ple will see the messages you post. However, there is also riod. awidedistributioninuseractivity,andapersonwithamil- Thedatacollectionprocessresultedin23,831,568tweets lionfollowerswhoneverpostsanythingwillnotreceiveany andadynamicnetworkwith1,944,383usersand17,869,415 attention.Therefore,weproposetwoadditionalmeasuresof ityhasGinicoefficientofaround0.8.1 Attentioninequality 100 100 Number (Gini=0.9412) on Twitter is staggering. The vast majority of users do not Attention (%) 24680000 NNuummbbeerrfmroeeltlwonwetioeentrsss ((GGiinnii==00..99013343)) Activity (%) 24680000 NNuummbbeerrfsrtiaetnudss ((GGinini=i=00..56983391)) raueetnstcegeeranigevtineeomgnaanetgnhyetamanitsettenhtntehtaeibononndt,utaomacmntbidve9irtt9hyo%efotnocfropTimew1nb%idtitsneoertfd?h!euAyWsemfrhosealagtlosetuwhtrefenaaorndfdmruivotshereeesr 0 0 number of tweets they post. Figure 2(b) shows the Lorenz 0 20 40 60 80 100 0 20 40 60 80 100 Sorted population (%) Sorted population (%) curves of these distributions for the seed users in our data. (a) (b) Resultsindicatethatuserengagementisalsounequallydis- tributed,withGinicoefficientofg =0.6831forthenumber Figure2:LorenzcurvesandGinicoefficientsofinequalities of friends and g = 0.5939 for the number of tweets. Al- of(a)attentionsand(b)activitiesamongseedusers. thoughstillunequal,useractivityismoreequallydistributed thanattention. attention—thenumberoftimesauser’spostsareretweeted, DynamicsofInequality orresharedbyothers,andthenumberoftimesauserismen- tionedbyothers.Thesemeasuresofattentionarerelated:on 1.00 average, the number of times a user’s post is retweeted or 2014-03-24 2014-08-12 2014-10-14 the number of times a user is mentioned is proportional to thenumberoffollowerstheuserhas.However,usingtheto- ni 0.80 Gi talnumberofretweetsallowsustoincludevariationinuser activity in a measure of received attention; while using the 0.60 totalnumberofmentionsincludeseffectsofcelebrity.Since 0.50 we have the network, profile and activity information only Gini 0.02 for seed users, we analyze how these seeds users allocate R- 0 -0.02 theirattentiontootherusers. followers RT user RT item M user seed all seed Inequalityofadistributioncanbecharacterizedusingthe Lorenz curve, which graphically represents the cumulative 1 2014-03-24 2014-08-12 2014-10-14 distribution function of the empirical probability distribu- tion. In other words, it plots the cumulative share of value, e.g., number of followers, as a function of the cumulative 0.8 percentage of the population who have at least that many ni followers.Perfectequalityisastraighty = x,inwhichthe Gi bottom x% of people have y% of the followers. However, 0.6 when inequality exists, the Lorenz curve is highly skewed. Figure2(a)showstheLorenzcurvesforthenumberoffol- lowers of seed users, the number of times they were men- 0.5 friends status new friends un friends tioned,andthenumberoftimestheirpostswereretweeted. Itshowsattentionunequallydistributed:thetop1%ofseed Figure 3: Change in inequality users over time. Here users account for approximately 50% of the total retweets, RTitem indicates the number of re-tweets a tweet that is 60%ofthetotalfollowers,and66%ofthetotalmentions. all posted by any seed user received from all users in one Another way to characterize the inequality of a distribu- week period; while RTuser/Muser denotes the number of tioniswiththeGinicoefficient(Gini1997).Givenasample seed seed retweets/mentions a user received from seed users in one ofdatax ,···x ,theGinicoefficientcanbecalculatedfrom 1 i weekperiodatthreepointsintime. thefollowingequation: (cid:80) (cid:80) g = i j|xi−xj| (1) Is high inequality simply an artifact of Twitter’s young 2N2x ageandwillcorrectitselfovertimeasusersdistributetheir attention more equitably? Or is the inequality increasing Gini coefficient is zero, meaning there is perfect equality, overtime?Figure3showshowinequalitychangedovertime when all values in a sample are the same, and it attains a during data collection period. The plot shows the Gini co- value one, meaning maximum inequality, when one num- efficients of different measures of attention at three points ber in the set is nonzero and the rest are zeros. Gini co- intime.Theinequalityofthenumberoffollowers,andthe efficient also gives the area between the Lorenz curve of number of retweets of a user (RTuser) are increasing over a distribution and the line of perfect equality. Gini coeffi- seed time; while attention to content, given by the number of cientofthenumberoffollowersinFig.2(a)isg = 0.9412, times different items are retweeted by all users (RTitem) for the number of mentions it is g = 0.9133, and for the all fluctuates over time. The attention inequality in terms of number of retweets it is g = 0.9034. The Gini coefficient of the much discussed income inequality is less than 0.5 1http://en.wikipedia.org/wiki/List_of_ for the United States, and the much larger wealth inequal- countries_by_distribution_of_wealth number of mentions, increases over time, but the margin Table 1: Statistics of classes when users are divided into is too small to be significant. The plot also shows that the attention that items posted on Twitter receive (RTitem) is quintiles based on the number of followers they have. The all numbersrepresenttheaveragenumberoffollowersandthe somewhatmoreequitablydistributedthantheattentionpaid averageactivityofusersineachclass. tousers. quin-tile #statuses #followers Inaddition,wecalculateinequalityofuseractivityatdif- Q 1st 14,086 29,021 ferentpointsintime,whichincludesthenumberoffriends, 1 numberofstatusupdates,numberofunfriends(i.e.,number Q2 2nd 7,184 1,058 oflostfollowers),andnumberofnewfriendsgainedatdif- Q3 3rd 4,053 445 ferenttimeperiods.Theresultsareplottedinthelowerpart of Figure 3. Note that we use number of status updates to denotetheaggregatenumberoftweetsusersmadesinceac- and tweeted over 14K status updates on average. Users in countcreation.Theseresultsindicatethatactivityinequality thesecondquintile,representingthenext20%ofmostpop- alsoincreasesovertime,whichindicatesthatbothnewcon- ularusers,havealmost1Kfollowersonaverageandposted tentandnewlinksarecreatedbyasmallsetofsuper-active almost 7K tweets, while users in the third quintile have a users. Also, activity inequality, while high, is substantially little morethan 400 followers andposted almost 4K times. lessthanattentioninequality. Usersinthebottomtwoquintilesdidnothavemuchactivity Thesefindingssuggestthatthereareprocessesthatdrive andwereexcludedfromanalysis.Themainideabehindthis attention to be concentrated on few individuals at the ex- division is that users within the same quintile will be more clusionofothers,eventhoughuserengagementisfarmore homogeneousandsimilartoeachother. equally distributed. In the next section we examine these questionstoconstructamodelofthedynamicsofattention. AttentionDiffusionModel ers106 ets105 ow105 we104 TsdthoiofecfihuTaeswlilopmitnteeedoxrifpafloa,atlwitlneoenwdtdiyeoernvnane.melTotiwphcisoasrokmmfaooasddtteeuellsnecotirafospnntaeuritrenweesboqrrtuokhauedlgieythyvntoatilmonutitoichonsenlvianoitea-f Number of foll111000234 mmeeadnian Number of ret1111100000−01231 QQQ123 bbbyyy fffooollllllooowwweeerrrsss tention of potential new followers. As these subsequently 1 00 101 102 103 1 00 101 102 103 Number of tweets Number of tweets linktotheoriginalusers,theTwitternetworkgrows,bring- (a) (b) ingevenmoreattentiontotheoriginalusers. One of the best-known models of network evolution is the BA model (Barabasi and Albert 1999) with preferen- Figure4:Relationbetweenpostingactivityandattention((a) tialattachmentmechanism.Inthismodel,anewnodelinks numberoffollowers,(b)numberofretweets).Moretweets to an existing node with probability proportional to its de- auserposted,morefollowersandretweetsauserreceived. gree.Thismakesitmorelikelyforhighdegreenodestoac- quirenewlinks,resultingina“richgetricher”phenomenon. Aftercreatingmorehomogeneouspopulations,weexam- This is a plausible model for the growth of the Twitter fol- ine how the attention users receive leads them to gain, or lowernetwork:userswhoalreadyhavemanyfollowersare lose, followers. Again, we quantify attention by the num- morelikelytogetattentionandattractevenmorefollowers. ber of times users are mentioned and the number of times However,Twitterplatformishighlycomplexanddynamic: theyareretweetedbytheirfollowers.Bothofthesedepend users post messages, which may then be retweeted by oth- onthenumberoffollowersuserscurrentlyhave.Whilethis ers.Theseretweetscreateopportunitiesfornewfollowlinks isnotdirectlyunderuser’scontrol,theycantrytoincrease to be formed (Weng et al. 2013). Users are also mentioned thenumberoffollowersbyfollowingotherusers,andhop- by others, which also creates opportunities for new follow ing that some of these will result in reciprocal links. Users links(ZhuandLerman2014).However,linksareoftenbro- can also attempt to increase the number of times they are ken as users unfollow others (Myers and Leskovec 2014). retweetedbypostingmoretweets,althoughburstsoftweet- Below we empirically characterize how these factors con- ing activity may cause followers to classify users as spam- tributetothegrowth—andtheloss—ofattentiononTwit- mersandunfollowthem.Belowweexaminehowthesefac- ter. torsarecorrelatedwiththeattentionusersreceiveandtheir Twitter users are heterogeneous, and popular users who probabilitytogainorlosefollowers. are “wealthy” in attention could behave in a qualitatively Observation1 (Effect of tweeting) Users who post more different manner from the less popular users. To partially tweetsreceivemoreattention. controlforheterogeneity,wesplitusersintoclassesaccord- ingtothenumberoffollowerstheyhave.Forsimplicity,we We examine the relationship between posting activity, usefiveclasses,correspondingtoquintilesinthenumberof measured by the number of status updates users tweet, and followers.Table1reportsstatisticsofthetopthreequintiles. the number of followers they have. Figure 4(a) confirm an Usersinthefirstquintile,representingthe20%oftheusers earlierfinding(Hodas,Kooti,andLerman2013)thatactiv- with most followers, have over 29K followers on average ityandnumberoffollowersarepositivelycorrelated.Post- mber of followers110035 mmeeadnian mber of retweets01..1255x 10mm4eeadnian of new followers111000−−−321 QQQ123 bbbyyy fffooollllllooowwweeerrrsss of new followers111000−−−321 QQQ123 bbbyyy fffooollllllooowwweeerrrsss Nu100 Nu 0 ob ob 100101N1u0m2b1e0r3 o1f 0fr4ie1n0d5s106107 1Numb2er of f3riends4x 1055 Pr10−4 100 1000 60000 Pr10− 4 100 1000 60000 (a) (b) Number of retweets Number of mentions (a) (b) Figure5:Relationbetweenfollowingactivityandattention ((a)numberoffollowers,(b)numberofretweets). Figure 6: Probability of gaining new followers vs (a) the numberofretweetsand(b)mentionsusersreceive. ingactivityalsoaffectshowmuchattentionuser’spostsre- −2 10 ceive, which we measure by the number of times the posts s r e areretweeted.Figure4(b)showsaclearpositivecorrelation w between activity and number of retweets for the first three o userquintiles.Sinceeachretweetincreasesuser’svisibility oll bybringinghisnameintothefeedsoftheretweetinguser’s w f10−3 e followers,italsoincreasesthelikelihoodofgainingnewfol- n lowers(Wengetal.2013;ZhuandLerman2014).Wedonot of showtheplotthatevaluatestherelationshipbetweenposting b o activityandmentionssincethatcorrelationisrelativelylow Pr10−4 0.0259. 0 1 2 3 4 5 6 7 10 10 10 10 10 10 10 10 Observation2 (Effect of following) The following others Number of followers doesnotnecessarilyincreaseattention. Next,weexaminewhetherbyfollowingmorepeople,users Figure7:Probabilityofgainingfollowersvsthenumberof canincreasetheattentiontheyreceive.Ingeneral,thereisa currentfollowersusershave. positivecorrelationbetweenthenumberoffriendsusersfol- lowandthenumberoffollowerstheyhave(Figure5(a)).The Figure6(b)showstherelationshipbetweenthenumberof exception to this trend are the few outliers who have fewer mentions users receive and the probability of gaining new friends than followers. Most of these users represent orga- followers. Interestingly, mentions serve to attract new fol- nizationalaccounts,suchasYouTubewithonly851friends lowers mainly for the most popular users, i.e., those with but47.6millionfollowers,orcelebrities,suchasLadyGaga mostfollowers. with133thousandfriendsbut43.6millionfollowers.How- ever,asshowninFigure5(b),thereisnostrongcorrelation Observation4 (The Matthew effect) The more attention between the number of friends and the number of retweets usershave,themoreattentiontheywillreceive.Thelessat- user receive. Similarly, the correlation between following tentionusershave,theeasieritistoloseattention. activityandnumberofmentionsisonly0.0124.Therefore, Thefeedbackbetweenusers’popularity,measuredbythe following more people does not translate into increased at- number of followers they have, and the attention they re- tention. ceiveviaretweetsandmentions,leadstoa“rich-get-richer” Observation3 (Effect of attention) The more attention phenomenon.Figure7empiricallyvalidatesthepositivecor- users receive, via retweets and mentions, the more follow- relationbetweenthemeanprobabilityofacquiringnewfol- ersuserswillgain. lowerandthenumberoffollowersusersalreadyhave.Note Aspreviouslyobserved,eachretweet(Wengetal.2013) that we log-binned the data to improve statistics; however, andmention(ZhuandLerman2014)byfollowersincreases sincetherearefewuserswithverylargenumbersoffollow- the attention the user receives by bringing his or her name ers,therearelargefluctuationsintheextremerightportion intothefeedsofotherusers.Theseothersmaysubsequently oftheplotduetopoorstatistics.Morefollowerswill,inturn, decidetofollowthementionedorretweeteduser,increasing create more attention and even more followers. Overall, it thenumberoffollowersheorshehasasaresult.Figure6(a) appearsthatMattheweffect(Merton1968)alsooperatesin teststhisideabyplottingthemeanprobabilityofacquiring socialmedia:“attentionbreedsattention.” a new follower as a function of retweet count among the In addition to gaining followers, users can also lose fol- firstthreeuserquintiles.Generally,thereisastrongpositive lowersinthedynamicTwitternetwork(MyersandLeskovec correlationbetweenthenumberofretweetsandprobability 2014).Infact,weobservednearlyasmanynewlinkscreated of getting new followers. Moreover, the correlation among asdestroyed.Figure8plotsnumberofnewfollowlinkscre- userswithlargestnumberoffollowersisevenstronger(Fig- ated and the number of follow links broken by seed users ure6(a)). duringeachtimeperiod.Bothcreatedlinksandbrokenlinks x105 1.6 new links Table2:Notationsandexplanations. ers 1.4 broken links Variable Description s of u 1.2 pi numberoftweetsmadebyuseri er f numberoffollowersofi b i m 1 Nu ri numberofretweetsofi’spostsbyfollowers m numbermentionsofibyfollowers 0.8 i Jul 22-Aug 05Aug 06-Aug 2A0ug 21-Sep 0S4ep 05-Sep 1S9ep 20-Oct 04Oct 05-Oct 19Oct 20-Nov 12 PPff+− pprroobbaabbiilliittyyooffgloasiinnigngfoflollolwoweresrs Figure8:Totalnumberofnewlinksandbrokenlinksmade ever, that tweeting burstiness does not necessarily increase byseedusersinhalfmonths. theprobabilityofeithergainingnewfollowersorlosingfol- lowers. The correlation between burstiness and the proba- bility of gaining new followers for the top 3 quin-tiles is s 1 s100 0.0797, -0.085 and 0.1596 respectively; while the correla- er mean er mean ow median ow median tionbetweenburstinessandtheprobabilityoflosingfollow- oll oll ers for the top 3 quin-tiles is 0.0765, 0.0451, and -0.0811. n f0.5 n f10−1 u u Therefore, for users who have low attention, burstiness of of of b b tweeting might help gain new followers. In addition, for o o Pr 100 0101102103104105106107 Pr10−120 0 101 102 103 104 105 umsiegrhstwcahuosheaivnefomrmucahtioantteonvteirolno,adbuarnsdtinloessesfoofllpoowstervsi.sibility Number of followers Number of retweets (a) (b) Collectively these observations suggest that both prongs oftheMattheweffectoperatetochangethenumberoffol- lowers and increase the inequality of attention: those with Figure9: Thecurrentattention((a)numberoffollowers,(b) most followers also acquire new followers at a faster rate, numberofretweets)againstthemeanprobabilityoffollow- andthosewiththeleastfollowersalsolosethematafaster erstheylost.Hereweuselogarithmicbinsize. rate. Since the number of followers is directly related to howmuchattentionusersreceive,thoserichinattentionget contributesignificantlytothedynamicsofthenetwork.The richer,andthosepoorinattentiongetpoorer.However,users averageofnewfriendsfollowedperuserinatwo-weekpe- cancontroltosomeextenthowmuchattentiontheyreceive, riod is around 20.87, and average of unfollowed users per at least via retweeting, by increasing their posting activity person is around 19.04. These statistics show that broken withoutworryingthattheywilllosefollowersasaresult. linksoccurasfrequentasnewlinksinsocialmedia. ModelDescription Wholosesfollowersandwhy?AsshowninFigure9(a), userswithmanyfollowers(>1,000)arefarlesslikelytobe Attentiondiffusionprovidesamechanismfortheevolution unfollowedthanuserswithfewfollowers(<100).Unfollow of the Twitter follower network. Below we introduce a dy- probability does not vary by posting activity: for the same namicmodelwherethelikelihoodofforminganewfollow number of followers, users who post many status updates link to a user, or breaking an existing link, depends on the are not more likely to be unfollowed than users who post amount of attention the user receives. Thus, the main vari- fewstatusupdates. ables used by the model include the number of followers Figure 9(b) plots the probability of losing followers as a usershave(fi),thenumberoftimestheytweet(pi),andthe function of the number of times user’s posts are retweeted. numberoftimestheyarementioned(mi)ortheirpostsare Clearly, the more times the users are retweeted, the less retweeted(ri).ThesevariablesaresummarizedinTable2. likelytheyaretolosefollowers,perhapsbecauseothersfind According to the observations of the previous section, theirtweetsvaluable. the average probability of gaining new followers P+ is a f Wealsostudiedwhetherburstsoftweetingactivityresult function of the number of times the user is mentioned or inalossoffollowers.Toomanytweetsappearingallatonce retweeted.Figures6(a)and6(b)suggestthatinourdataset infollowers’feedscouldbeinterpretedasspammingbehav- thisfunctioncanbeapproximatedwith ior,resultinginfollowersbreakingthelink.WeadoptaKur- tosis metric to evaluate the burstiness of tweeting. Specifi- Pf+ ∼w1rα+w2mβ, (2) cally,theKurtosismetricisdefinedasfollows: wherew controlsthecontributionofretweetstotheforma- 1 Kurt(u)= E[(Pu−Pu)]4 tionofnewlinks,andw2 controlsthecontributionofmen- tionstonewlinks. (E[(P −P )2])2 u u Similarly,Figure9suggeststhatforourdata,theaverage where P is a random variable that denotes the number of probabilityofunfollowingP−canbeapproximatedwith: u f tweets posted by u at each time period, and P is the av- u erage number of tweets per time period. We found, how- P− ∼w r−θ. (3) f 3 −3 y10 Table3:Modelparametersestimatedbyregression. c −4 n10 Parameter Userquin-tile e −5 u10 αβ 00..Q6836145 1.00Q1.0245 01..Q4144381 ed freq111000−−−876 GSirmouunladti otrnuth z10−9 θ 0.129 -0.730 -0.020 ali10−10 w 0.00215 0.0 0.00006 m −11 1 r10 o w2 0.00038 0.00030 0.0 N 0 1 2 3 4 5 6 7 8 w 0.00836 -0.00135 -300.0 10 10 10 10 10 10 10 10 10 3 C 8546 754 -9 Number of followers RMSE 52800 406 334 Figure 10: Distribution of the predicted and actual number offollowersofseedusers. Theequationsaboveallowustospecifyhowthenumber of followers of a user i changes over time. Let f be a dy- i newfollowers,aremuchhighforusersinthefirstquin-tile namic variable representing the number of followers user i (Q ) than the other quin-tiles. This shows that users with hasatagiventime.Thentherateatwhichuserigainsnew 1 most followers, are more likely to be mentioned by oth- followersis: ersandconsequentlyfollowed.Themost-followedusersare df+ likelytobecelebrities.Whentheyarementioned,andpeo- i =(N −f )(w rα+w mβ) (4) dt i 1 i 2 i plewhodonotyetfollowthemseetheirname,theymayrec- ognize the name and decide to follow the celebrity in turn. whereN isthesizeoftheuserpopulation,andN −f de- i For the more ordinary users in the third quin-tile Q , men- notesthesetofavailablepotentialfollowers,i.e.,userswho 3 tions do not lead to any new followers. For them, attention donotyetfollowuseri. doesnottranslateintonewfollowers.Inaddition,forusers WerewritetheEq.4as inthefirstquin-tile,thenumberoffollowerslostdecaysex- df+ ponentiallywiththenumberofretweets,indicatingtheyare dit =bifi−biN (5) muchlesslikelytolosefollowersiftheypostgoodcontent, evenlotsofit.Incontrast,forusersinthesecondandthird wherebi =−w1riα−w2mβi.ByintegratingEq.5,weobtain: quintileQ2andQ3,thenumberoffollowerslostdecayssub- linearly with the number of retweets. In other words, they fi+(t)=Cebit−biNt (6) are more likely to lose followers than first quin-tile users, evenwhentheypostcontentthatreceivessimilarlevelofat- Similarly, the rate at which user i loses followers is de- tention.Theattention-richusershaveanadvantageoverthe finedas df− restoftheusers:theyaccruenewfollowersmoreeasilyand i =f w r−θ (7) don’tlosethemasmuchastherestoftheusers. dt i 3 i ByintegratingEq.7,weobtain: SolvingtheModel fi−(t)=C2ew3ri−θt (8) B(Tyabsloelv3i)n,gwtehceamnpordeedlic(Ettqh.e9n)uwmibtherthoeffloelalronweedrspaeraacmhesteeerds CombiningEq.6withEq.8andusingthefactorthatfi(t)= user in our data has at any time after training time period. fi(0) when t = 0, we have C2 = C. Then the dynamic Here we set the initial condition, that is, f(0), as the num- numberoffollowerscanbecomputedas: ber of followers each seed user had on July 20 2014. Fig- ure 10 compares the predicted distribution of followers of fi(t)=f(0)+Cebit−Cew3ri−θt−biNt (9) seedusersonOctober14,2014totheactualnumberoffol- lowerstheyhaveonthatdate(groundtruth).Thisvalidates ParameterEstimation thattheproposedmodelcancorrectlyreplicatethedistribu- Themodelisparameterizedbyconstantsα,β,θ,C,w ,w , tionofattentiontousers. 1 2 and w . We estimate these parameters using regression on Next,weexaminehowwelltheproposedmodelpredicts 3 data between March 20 to July 20. Since different popula- dynamicsoffollowerinequality.Figure11comparesthein- tionsofusers(indifferentquin-tiles)mightbecharacterized equality of the predicted and actual number of followers at bydifferentparameters,inthefollowing,weseparateusers threedifferentpointsintime.Theattentiondiffusionmodel by quin-tile, and estimate model parameters separately for results in high follower inequality, with values of the Gini eachpopulation.Wesettheunitoftimeintervalasfourdays. coefficient around 0.94. In addition, in the attention diffu- TheresultsofestimatedparametersarereportedinTable3. sionmodel,inequalitygrowsovertimeasattentionbecomes Table 3 indicates that different user populations attract more concentrated on a small number of users. The trends newfollowersindifferentways.Forexample,thevaluesof usingtheproposedattentionmodelareclosetotheempiri- parametersβ andw ,whichcontrolhowmentionsgenerate calgroundtruth. 2 1.0 2014-03-24 2014-08-12 2014-10-14 to use it to model the attention to messages being tweeted. Besides,resultsinFigure11showthattheattentionuserre- 0.8 ceivedintermsofthenumberofretweetsandthenumberof Gini mentions,maynotsimplybemodeledbyEq.10andEq.11. Oneofthefuturedirectionistomodelattentiondiffusionin 0.6 terms of number of retweets and number of mentions with 0.5 bothnetworkandcontentinformation. ni 10-2 R-Gi 10-3 RelatedWorks 10-4 Our work touches on topics from a variety of fields. We FL FL tweet RT RT M M GT model GT GT model GT model brieflyreviewsomeoftherelevantliterature. Figure 11: The Gini inequality of predicted number of fol- Inequality lowersFL versusthatofempiricalgroundtruthnum- model Thestudyofeconomicinequalityhasalonghistorybegin- ber of followers FL . In addition, the Gini of predicted GT ning with the quantitative study of personal wealth distri- numberofretweetsRT andpredictednumberofmen- model bution by Vilfredo Pareto (Pareto 1896). In this work, the tions M is compared with that of empirical number modeL empirical data analysis showed that the tail of wealth dis- ofretweetsRT andempiricalnumberofmentionsM . GT GT tributionfollowsapower-law.Morerecentanalysisshowed Since number of re-tweets is related to posting activity, we that wealth inequality (in the U.S.) is not only large (Nor- alsoreporttheinequalityofpostingactivityduringoneweek ton and Ariely 2011) but increasing with time (Piketty and intervaltweet . GT Saez 2014). A number of models (Champernowne 1953; Scafetta, Picozzi, and West 2004; Silva and Yakovenko 2005)havebeensuggestedtounderstandthefeaturesofem- We can also use the attention diffusion model to predict pirical wealth distributions and relate them to appropriate inequalityofattentionviaretweetsovertime.Wemodelthe mechanisms.Forexample,BanerjeeandYakovenko(Baner- number of retweets users receive is a function of the num- jeeandYakovenko2010)adoptedtheFokker-Planckequa- ber of followers they have and their tweeting activity, and tion model for the income distribution, which leads to an thenpredictthenumberofretweetsusingregression.Specif- approximatedexponentialforsmallandmid-rangeincomes, ically,thenumberofretweetsauserreceived,canbeapprox- andasapower-lawforthehighestincomes. imatelymodeledas: Inequalityexistseverywhereonline(Wilkinson2008).On r =afb+cpd (10) theWorldWideWeb,80%ofthebrowsingtrafficgoesto5% i i i ofthehostnames(McCurley2007).Thebrowsingdistribu- We can then predict the number of retweets users get at tionisalsohighlyskewedamongURLs,sitesanddomains. any time by using their predicted number of followers and Fortunato et al. (Fortunato, Flammini, and Menczer 2006) their observed tweeting activity at that time. As shown in suggests that search engines bias user traffic by their page Figure 11, the Gini coefficient of the predicted number of rankingstrategies.Theyobservedahighcorrelationbetween retweets rises over time; however, it is much less than the click probability and hit rank. Therefore, the ranking pro- groundtruthvalue.ThepossiblereasonisthatinEq.10,the cesshasapotentialtoskewbrowsingattentiontothosetop number of retweets is directly related to tweeting activity . rankedURLs. Since tweeting activity has much lower inequality, the in- Recently, with the advance of social networks, the stud- equalityofnumberofpredictedretweetsisalsolowerthan ies of socio-economic inequalities have sparked a renewed thegroundtruthvalue. interests from researchers. Specifically, much research has Similar to above process, we model the number of men- focusedonabetterunderstandingofthenatureandoriginof tionsusersreceivedasafunctionofnumberoffollowers: inequalitiesfromuserbehaviorandnetworkaspects.Among m =aebfi (11) them, Fuchs and Thurner (Fuchs and Thurner 2014) ana- i lyzed data from the virtual economy of the massive multi- WithEq.11,wepredictthenumberofmentionsatthree player online game, to find the explanations for origins of different time points and report its Gini coefficient in Fig- wealth inequality in terms of behavior and network. Their ure 11. The Gini coefficient of the predicted value, is very resultssuggestedthatwealthyplayershavehighin-andout- close to the Gini coefficient of number of followers and degree in the trade network; while in contrast, players that thegroundtruth.However,themarginaldifferencesbetween are not socially well-connected are significantly poorer on two different time points in the ground truth, are relatively average. In addition to wealth inequality, Chatterjee (Chat- smallerthanthoseintheprediction.ThisisduetoinEq.11, terjee 2014) have also studied the city size inequality, firm numberoffollowersistheonlydependantfactorofnumber inequality, voting inequality, opinion inequality and their ofmentions. origins from a statistical perspective. In this work, we aim Inconclusion,theproposedmodeldescribesattentiondif- tostudytheattentioninequalitiesinsocialmediaandtoun- fusion among users in a network and can successfully pre- derstandsomepossiblereasonsthatleadtotheinequalities. dictthenumberoffuturefollowerseachuserwillhaveand Inaddition,wearealsointerestedinunderstandinghowin- theresultingattentioninequality.However,itisunclearhow equalitieschangesovertime. NetworkGrowth model,attentioninequalitygrowsintime. Our work leaves many questions unanswered. What are Nodeletal.(NoelandNyhan2011)analyzedthe“unfriend- the benefits of inequality? Is it due to easier to deal with ing”behaviorusinglongitudinalsocialnetworkdata.Their information overload if users only have to pay attention to resultsindicatethatifthereisanon-negligiblefriendshipat- popularitems?Whatarethedisadvantagesofinequality?Is tritionovertime,homophilyinfriendshipretentioninduces it associated with less diversity in content and viewpoints significant upward bias to peer influence. Kwak (Kwak, online?Whatmotivatespeopletobeengagedandcontribute Chun,andMoon2011)studiedthedynamicsofunfollowing tosocialmediawhentheydonotreceiveanyattention?We in Twitter and showed that the reciprocity of the relation- plantoaddresssomeofthesequestionsinfuturework. ships,thedurationofarelationship,thefollowee’sinforma- tiveness,andtheoverlapoftherelationships,affectthedeci- References siontounfollow.MyersandLeskovec(MyersandLeskovec 2014) studied the bursty dynamics of the Twitter Informa- [Adler1985] Adler,M.1985.Stardomandtalent.TheAmer- tionNetworkandfoundthatthedynamicsofnetworkstruc- icanEconomicReview75(1):208–212. turecanbecharacterizedbysteadyratesofchangebutwith [Allison,Long,andKraze1982] Allison, P. D.; Long, J. S.; interruptionbysuddenbursts.Theyalsodevelopedamodel andKraze,T.K.1982.Cumulativeadvantageandinequality thatquantifiesthedynamicsofthenetworkandpredictsthe inscience. Ame.SociologicalReview47(5):615–625. burstyofinformationdiffusionevents. [Allison1980] Allison,P.D. 1980. InequalityandScientific An array of models for describing the growth of social Productivity. SocialStudiesofScience10(2):163–179. networks have been described in literature. Random Walk Model (Va´zquez 2003) was introduced to model the ran- [BanerjeeandYakovenko2010] Banerjee, A., and domized process of creating new links in a social network. Yakovenko, V. M. 2010. Universal patterns of inequality. The BA Model (Barabasi and Albert 1999) uses the pref- NewJournalofPhysics12(7):075032. erential attachment mechanism (Va´zquez 2003) to gener- [BarabasiandAlbert1999] Barabasi, A.-L., and Albert, R. ate the rich-get-richer phenomenon. The Nearest Neighbor 1999. Emergence of scaling in random networks. Science Model (Newman 2001) was proposed based on the obser- 286(5439):509–512. vationthattheprobabilityoftwounacquainteduserstobe- [Champernowne1953] Champernowne, D. 1953. A model comefriendsincreaseswiththe(weighted)numberofcom- forincomedistribution. EconomicJournal63:318–351. mon friends. Durr et al. (Durr, Protschky, and Linnhoff- [Chatterjee2014] Chatterjee, A. 2014. Socio-economic in- Popien2012)proposedanewmodelbasedonhomophilyfor equalities:astatisticalphysicsperspective.Technicalreport, bothP2Pandcentralizedonlinesocialnetwork.Theirmodel arXiv.org. is able to explain the distribution and frequency of interac- tions in online social networks. In our attention model, we [Durr,Protschky,andLinnhoff-Popien2012] Durr, M.; aim to characterize the attention diffusion and network dy- Protschky, V.; and Linnhoff-Popien, C. 2012. Modeling namicmechanisms.Inaddition,weattemptforexplanations social network interaction graphs. In IEEE/ACM Interna- forinequalitiesand inequalitydynamicsfromour attention tionalConferenceonAdvancesinSocialNetworksAnalysis model. andMining(ASONAM),660–667. [Fortunato,Flammini,andMenczer2006] Fortunato, S.; Conclusion Flammini, A.; and Menczer, F. 2006. Scale-free network growthbyranking. Phys.Rev.Lett.96:218701. We analyzed how people allocate their attention to other [FuchsandThurner2014] Fuchs, B.,and Thurner, S. 2014. users on a popular social media site Twitter by following Behavioral and network origins of wealth inequality: In- them,sharing(retweeting)themessagestheypost,andmen- sightsfromavirtualworld. PLoSONE9(8):e103503. tioningthemintheirownposts.Weconcludedthatattention is non-uniformly distributed, with a small number of users [Gini1997] Gini, C. 1997. Concentration and dependency dominatingtheattentionreceived.Thisresultsinahighat- ratios. RivistadiPoliticaEconomica87:769–792. tention inequality, much higher than even the widely dis- [Hodas,Kooti,andLerman2013] Hodas, N. O.; Kooti, F.; cussedeconomicinequalities.Moreover,weshowedthatat- and Lerman, K. 2013. Friendship paradox redux: Your tentioninequalityisincreasingonTwitter. friendsaremoreinterestingthanyou. InProceedingsofthe Toexplaintheriseofattentioninequality,westudiedhow 7ThInternationalAaaiConferenceOnWeblogsAndSocial the structure of the follower network changes due to at- Media(ICWSM). tention diffusion. We showed empirically that the attention [KawachiandKennedy1999] Kawachi, I., and Kennedy, users receive by being mentioned or having their messages B. P. 1999. Income inequality and health: pathways and retweeted leads to them gain new followers. The more at- mechanisms. Healthservicesresearch34(1Pt2):215–227. tention users have, the less likely they are to lose follow- ers,resultingina“richgetricher,poorgetpoorer”dynamic. [Kelly2000] Kelly,M. 2000. Inequalityandcrime. Review We constructed a phenomenological model of network dy- ofEconomicsandStatistics82(4):530–539. namics based on these observations, and used the model to [KlamerandVanDalen2002] Klamer, A., and Van Dalen, predict the number of followers a user will have in the fu- H. P. 2002. Attention and the art of scientific publishing. ture,andestimatefuturelevelsofattentioninequality.Inthis JournalofEconomicMethodology9(3):289–315. [Kwak,Chun,andMoon2011] Kwak, H.; Chun, H.; and Proceedings of the ASE/IEEE Conference on Social Com- Moon, S. 2011. Fragile online relationship: A first look puting. at unfollow dynamics in twitter. In Proceedings of the SIGCHI Conference on Human Factors in Computing Sys- tems,1091–1100. ACM. [LariviereandGingras2010] Lariviere, V., and Gingras, Y. 2010. Theimpactfactor’sMattheweffect:anaturalexperi- mentinbibliometrics. JournaloftheAmericanSocietyfor InformationScienceandTechnology61(2):424–427. [McCurley2007] McCurley,K. 2007. Incomeinequalityin theattentioneconomy. http://mccurley.org/papers/effective. [Merton1968] Merton,R.K. 1968. TheMatthewEffectin Science. Science159(3810):56–63. [MyersandLeskovec2014] Myers, S. A., and Leskovec, J. 2014. The bursty dynamics of the twitter information net- work. InProceedingsofthe23rdInternationalConference onWorldWideWeb,913–924. ACM. [Newman2001] Newman, M. E. J. 2001. Clustering and preferentialattachmentingrowingnetworks. Phys.Rev.E. [NoelandNyhan2011] Noel, H., and Nyhan, B. 2011. The ??? unfriending??? problem: The consequences of ho- mophilyinfriendshipretentionforcausalestimatesofsocial influence. SocialNetworks33(3):211–218. [NortonandAriely2011] Norton,M.I.,andAriely,D.2011. Building a better America?One wealth quintile at a time. PerspectivesonPsychologicalScience6(1):9–12. [Pareto1896] Pareto,V. 1896. Coursd’EconomiePolitique. [PikettyandSaez2014] Piketty, T., and Saez, E. 2014. In- equalityinthelongrun. Science344(6186):838–843. [Salganik,Dodds,andWatts2006] Salganik, M.; Dodds, P.; andWatts,D. 2006. ExperimentalStudyofInequalityand Unpredictability in an Artificial Cultural Market. Science 311(5762):854–856. [Scafetta,Picozzi,andWest2004] Scafetta, N.; Picozzi, S.; and West, B. J. 2004. An out-of-equilibrium model of the distributionsofwealth. QuantitativeFinance4(3):353–364. [SilvaandYakovenko2005] Silva, A. C., and Yakovenko, V. M. 2005. Temporal evolution of the ???thermal??? and ???superthermal???incomeclassesintheusaduring1983– 2001. EurophysicsLetters. [Va´zquez2003] Va´zquez, A. 2003. Growing network with local rules: Preferential attachment, clustering hierarchy, anddegreecorrelations. Phys.Rev.E67:056104. [Wengetal.2013] Weng, L.; Ratkiewicz, J.; Perra, N.; Gonc¸alves, B.; Castillo, C.; Bonchi, F.; Schifanella, R.; Menczer, F.; and Flammini, A. 2013. The role of infor- mationdiffusionintheevolutionofsocialnetworks. InPro- ceedings of the 19th ACM SIGKDD International Confer- enceonKnowledgeDiscoveryandDataMining,356–364. [Wilkinson2008] Wilkinson, D. M. 2008. Strong regular- ities in online peer production. In EC ’08: Proceedings of the9thACMconferenceonElectroniccommerce,302–309. NewYork,NY,USA:ACM. [ZhuandLerman2014] Zhu, L., and Lerman, K. 2014. A visibility-basedmodelforlinkpredictioninsocialmedia.In

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.