Temporal Effects on Hashtag Reuse in Twitter: A Cognitive-Inspired Hashtag Recommendation Approach Dominik Kowald Subhash Pujari Elisabeth Lex Know-Center GrazUniversityofTechnology GrazUniversityofTechnology Graz,Austria Graz,Austria Graz,Austria [email protected] [email protected] [email protected] ABSTRACT ternetwork. Forexample,userscanretrievetweetscreatedduring theEuropeanfootballchampionshipbysearchingforthehashtag Hashtagshavebecomeapowerfultoolinsocialplatformssuchas 7 #euro2016,eveniftheydonothaveasociallinktothetweetpro- Twitter to categorize and search for content, and to spread short 1 ducers. Meanwhile,manysocialplatforms,suchasInstagramand 0 messagesacrossmembersofthesocialnetwork. Inthispaper,we Facebook,haveadoptedhashtagsaswell. 2 studytemporalhashtagusagepracticesinTwitterwiththeaimof designingacognitive-inspiredhashtagrecommendationalgorithm Problem. Unsurprisingly, thewidespreadacceptanceofhashtags n wecallBLLI,S. Ourmainideaistoincorporatetheeffectoftime hassparkedalotofresearchinthefieldofhashtagrecommenda- a on(i)individualhashtagreuse(i.e.,reusingownhashtags),and(ii) tions(seeSection6foraselectionofapproaches)tosupportusers J socialhashtagreuse(i.e.,reusinghashtags,whichhasbeenprevi- inassigningthemostdescriptivehashtagstotheirposts. Existing 5 ously used by a followee) into a predictive model. For this, we methodstypicallyutilizecollaborative, contentandtopicfeatures turntotheBase-LevelLearning(BLL)equationfromthecognitive oftweetstorecommendhashtagstousers.Undoubtedly,thesefea- ] R architectureACT-R,whichaccountsforthetime-dependentdecay turesplayanimportantroleinrecommendinghashtagsthatbestde- I of item exposure in human memory. We validate BLLI,S using scribeatweet. Inthispaper,however,weareespeciallyinterested . two crawled Twitter datasets in two evaluation scenarios: firstly, in predicting which hashtags a user will likely apply in a newly s c onlytemporalusagepatternsofpasthashtagassignmentsareuti- createdtweetgivenprevioushashtagassignments. [ lized and secondly, these patterns are combined with a content- Themainproblemwewanttoaddressiswhetherwecanidentify basedanalysisofthecurrenttweet. Inbothscenarios,wefindnot temporalusagepatternsthatinfluenceifaTwitteruserwilllikely 1 onlythattemporaleffectsplayanimportantroleforbothindividual utilizeacertainhashtaginatweet, giventhehashtagssheand/or v andsocialhashtagreusebutalsothatBLLI,Sprovidessignificantly herfolloweeshavebeenusinginthepast. Ourgoalistodescribe 6 betterpredictionaccuracyandrankingresultsthancurrentstate-of- suchtemporalusagepatternsusingamodelfromhumanmemory 7 the-arthashtagrecommendationmethods. theoryandtodesignahashtagrecommendationalgorithmbasedon 2 that. Tothebestofourknowledge,sofar,fewstudies(e.g.,[11]) 1 Keywords. Twitter; Hashtags; BLL equation; ACT-R; TF-IDF; haveinvestigatedthewaytemporaleffectscanbeexploitedinthe 0 Recency;HashtagRecommendation;HashtagReusePrediction hashtagrecommendationprocess. . 1 0 1. INTRODUCTION Approachandmethods.Weproposeacognitive-inspiredhashtag 7 Overthepastyears,themicrobloggingplatformTwitterhasbe- recommendationalgorithmwecallBLLI,Sthatisbasedontempo- 1 comeoneofthemostpopularsocialnetworksontheWeb. Users ralusagepatternsofhashtagsderivedfromempiricalevidence. In : essence,thesepatternsreflecthowaperson’sownhashtagsaswell v canbuildanetworkoffollowerconnectionstootherTwitterusers, ashashtagsfromthesocialnetworkareutilizedandreused. Inour i which means that they can subscribe to content posted by their X followees [31, 24]. Twitter was also the first social platform that approach,weutilizetheBase-LevelLearning(BLL)equationfrom r adoptedtheconceptofhashtags,assuggestedbyChrisMessina1. the cognitive architecture ACT-R [2, 3] to model temporal usage a Hashtagsarefreely-chosenkeywordsstartingwiththehashchar- of hashtags. The BLL equation accounts for the time-dependent decay of item exposure in human memory. It quantifies the use- acter “#” to annotate, categorize and contextualize Twitter posts fulness of a piece of information (e.g., a hashtag) based on how (i.e., tweets) [34, 13]. The advantage of hashtags is that anyone frequentlyandhowrecentlyitwasusedbyauserinthepastand withaninterestinahashtagcantrackitandsearchforit[38],thus modelsthistime-dependentdecaybymeansofapower-lawdistri- receivingcontentpostedbysomebodyoutsideoftheirownTwit- bution. Thus, BLL takesintoconsiderationthefrequencyand I,S 1https://twitter.com/chrismessina/status/223115412 recencyofhashtagsusedbyauserandherfolloweesinthepast. WepresentedtheBLLequationinourpreviousworkasamodel to recommend tags in social bookmarking systems such as Bib- Sonomy and CiteULike [21, 20]. In the present work, we build upontheseresultsbyadoptingtheBLLequationtomodeltheef- fectoftimeonthereuseofindividualandsocialhashtagstobuild our hashtag recommendation algorithm. We demonstrate the ef- ficacy of our approach in two empirical social networks crawled Copyrightisheldbytheauthors. fromTwitter. Thefirstsocialnetwork,termedCompScidataset,is ACMISBN978-1-4503-2138-9. builtuponthetweetsofasampleofTwitterusers,whohavebeen DOI:10.1145/1235 identifiedascomputerscientistsinpreviousrelatedwork[10],and Dataset US F U T HT HTAS | | | | | | | | | | | | their followees. The second network, termed Random dataset, is CompSci 2,551 241,225 91,776 5,649,359 1,081,403 9,161,842 builtuponthetweetsofasetofrandomlychosenTwitterusersand Random 3,466 252,219 127,112 8,157,702 1,507,773 13,628,750 theirfollowees. Weexperimentwiththesedatasetstoinvestigate theperformanceofourhashtagrecommendationapproachintwo Table1: StatisticsofourCompSciandRandomTwitterdata- settings: (i)tweetsofadomain-specificTwitternetwork, and(ii) sets. Here,|US|isthenumberofseedusers,|F|isthenumber tweetsofarandomnetworkofTwitterusers. offolloweesoftheseseedusers,|U|isthenumberoftotalusers, |T| is the number of Tweets, |HT| is the number of distinct Contributionsandfindings. Themaincontributionsofourwork hashtagsand|HTAS|isthenumberofhashtagassignments. aretwo-fold.Firstly,ourpapershowsthattimehasalargeeffecton individualaswellassocialhashtagreuseinTwitter. Specifically, weobserveatime-dependentdecayofindividualandsocialhash- 60 52 tagreusethatfollowsapower-lawdistribution. Thisfindingpaves CompSci %]50 the way for our idea to utilize the BLL equation as a predictive [ Random s modeltorecommendhashtagsfornewtweets. Thus, oursecond nt40 e m contributionisthatwedesign,developandevaluateapersonalized n 31 hashtagrecommendationalgorithmbasedontheBLLequationthat sig30 26 s outWpeerfimorpmlesmcuenrrtetnhtesBtaLteL-oefq-tuhaet-iaorntainpptwrooacvhaerisa.nts,wherethefirst htaga20 15 14 20 15 13 s one(i.e.,BLLI,S)predictsthehashtagsofausersolelybasedon Ha10 8 6 pasthashtagusage,andthesecondone(i.e.,BLL )combines I,S,C 0 BLLI,Swithacontent-basedtweetanalysistoalsoincorporatethe individual social individual/social network external textofthecurrentlyproposedtweetofauser. Weevaluateourap- Hashtagusagetype proachusingstandardevaluationprotocolsandmetrics,andwefind thatourapproachprovidessignificantlyhigherpredictionaccuracy Figure 1: Analysis of hashtag usage types in our two data- andrankingestimatesthancurrentstate-of-the-arthashtagrecom- sets.Foreachhashtagassignment,westudywhetherthecorre- mendation algorithms in both scenarios. We attribute this to the spondinghashtaghasbeenusedbythesameuserbeforeintime factthatourapproach,incontrasttootherrelatedmethods,mimics (“individual”), by some of the users she follows (“social”), by thewayhumansuseandadapthashtagsbybuildinguponinsights both(“individual/social”),byanyoneelseinthedataset(“net- fromhumanmemorytheory(i.e.,theBLLequation). work”)orneitherofthem(“external”). Wefindthatbetween 66%and81%ofhashtagassignmentsinbothdatasetscanbe Structureofthispaper. InSection2,wecontinuebydescribing explainedbyindividualorsocialhashtagusage(i.e.,thesumof thecrawlingprocedureofourtwoTwitterdatasetsandanalyzing “individual”,“social”and“individual/social”). hashtagusagetypesinthesedatasets.Then,inSection3,westudy temporalusagepatternsofindividualandsocialhashtagreuse. In Section4,wedescribetwovariantsofourapproach(i.e.,without usedtheStreamingAPIofTwitter3inOctober2015togetastream andwiththecurrenttweet). ThisisfollowedinSection5byour oftweetsandextractedtheuser-idstogetourlistofrandomseed evaluationmethodologyandexperimentalresults. Finally,wedis- users.Frombothuserlists,weremovealluserswithmorethan180 cussrelatedworkinthefieldinSection6andwegiveasummary followees,whichresultsin|US|=2,551seedusersfortheCompSci ofourfindingsaswellasourfutureplansinSection7. datasetand|US|=3,466seedusersfortheRandomdataset. The thresholdofusingamaximumof180followeesischosenbecause the Twitter Search API only allows 180 requests per 15 minutes, 2. DATASETS whichgivesusthepossibilitytocrawlthetweetsofallfollowees In this section, we describe the data collection procedure and ofaseeduserwithinthisreasonabletimewindow. thetwodatasetsweuseforourstudy. Additionally,weinvestigate (b)Crawlfollowees. Next, weusethesefollowerrelationships individualaswellassocialhashtagreusepatternsinourdatasetsas tocrawlthefolloweesF oftheseedusersinordertocreateadi- aprerequisiteforourhashtagrecommendationapproach. rectedusernetworkforanalyzingthesocialinfluenceonhashtag reuse. Basedonthenumberofseedusers,theaveragenumberof Crawlingstrategyanddatasetstatistics. Inordertoaddressour followeesperseeduser|F|/|U |=94inthecaseoftheCompSci S researchgoals,wecrawltwodatasetsusingtheSearchAPIofTwit- datasetand72inthecaseoftheRandomdataset. Followingthese ter2.ThefinalstatisticsofthesedatasetsareillustratedinTable1. notations, the set of followees of user u is denoted as F in the u Thefirstone(i.e.,CompScidataset)consistsofresearchersfrom remainderofthispaper. Overall,ourcrawlingproceduregivesus thefieldofcomputerscienceandtheirfollowees,whilethesecond |U|=91,776totalusersfortheCompScidatasetand|U|=127,112 one(i.e.,Randomdataset)consistsofrandompeopleandtheirfol- totalusersfortheRandomdataset. lowees. Ourideaistotestourhashtagrecommendationapproach (c)Crawltweets.Inthethirdstep,wecrawlthe200mostrecent intwodifferentnetworksettings: (i)adomain-specificone,inour tweetsofalltheusersandremovethetweetsinwhichnohashtags casethedomainofcomputerscientists,and(ii)amoregeneralone areused. Thethresholdofamaximumof200mostrecenttweets consistingofrandomTwitterusers. Ourcrawlingstrategyforboth issetbecauseofanotherrestrictionoftheTwitterSearchAPIthat datasetscomprisesofthefollowingfoursteps: only allows 200 tweets to be received per a single request. This (a)Crawlseedusers. Westartwithidentifyingandcrawlinga crawlingprocedureresultsin|T|=5,649,359tweetsfortheComp- listofseedusersUS foreachdataset. InthecaseoftheCompSci Scidatasetwithanaveragenumberoftweetsperuser|T|/|U|=61, dataset,wetaketheuserswhowereidentifiedascomputerscien- and|T|=8,157,702tweetsfortheRandomdatasetwith|T|/|U|= tists in the work of [10]. In the case of the Random dataset, we 64.Ourcrawledtweetscoveratimerangefrom2007to2015. 2https://dev.twitter.com/rest/public/search 3https://dev.twitter.com/streaming/overview htndividualreusecountof111111100000001234567 htndividualreusecountof111111110000000012345678 htSocialreusecountof 111111000000012345 htSocialreusecountof111110000012345 I100100 101 102 103 104 I100100 101 102 103 104 10−1100 101 102 103 104 100100 101 102 103 104 Reuserecencyofhtbyu[hours] Reuserecencyofhtbyu[hours] ReuserecencyofhtbyFu[hours] ReuserecencyofhtbyFu[hours] (a)Individualhashtagreuse (b)Individualhashtagreuse (c)Socialhashtagreuse (d)Socialhashtagreuse CompScidataset(R2=.883) Randomdataset(R2=.894) CompScidataset(R2=.689) Randomdataset(R2=.771) Figure2:TheeffectoftimeonindividualandsocialhashtagreusefortheCompSciandRandomdatasets(plotsareinlog-logscale). Plots(a)and(b)showthatthemorerecentlyahashtaghtwasusedbyauseru,thehigheritsindividualreusecount(i.e.,people tendtoreusehashtagsthathavebeenusedveryrecentlybytheirown). Plots(c)and(d)showthatthemorerecentlyauseruwas exposedtoahashtaght,whichwasusedbyherfolloweesF ,thehigheritssocialreusecount(i.e.,peopletendtoreusehashtagsthat u havebeenusedrecentlyinthesocialnetwork). Additionally,wereporttheR2estimatesforthelinearfitsofthedata. Wefindthat temporaleffectsplayanimportantroleinindividualandsocialhashtagreuseinbothdatasets. (d) Extract hashtags. Finally, we extract the hashtags of the Temporaleffectsonindividualhashtagreuse.Theeffectoftime tweets by searching for all words that start with a “#” character. onindividualhashtagreuseisvisualizedintheplots(a)and(b)of Thisresultsin|HTAS|=9,161,842hashtagassignmentsfor|HT| Figure2.Toputthex-scaleoftheseplotsontoameaningfulrange, =1,081,403distincthashtagsintheCompScinetworkand|HTAS| we set the threshold for the maximum hashtag reuse recency to =13,628,750for|HT|=1,507,773intheRandomnetwork.Thus, oneyear(i.e.,8,760hours). Theplotsshowtheindividualhashtag inbothdatasets,eachdistincthashtagisusedapproximately9times reusecountplottedoverthereuserecencyofahashtaghtbyauser onaverageandeachuserusesapproximately100hashtagassign- uinhours.Hence,foreachhashtagassignmentofahashtaghtby mentsinhertweetsonaverage.Examplesforpopularhashtagsare useru, wetakethetimesincethelastusageofhtbyu(i.e., the #bigdata,#iotand#uxincaseoftheCompScidataset,and#shah- reuserecency)andpooltogetherallhashtagassignmentswiththe bag,#ff and#artincaseoftheRandomdataset. samerecencyvalue(i.e.,thesametimedifferenceinhours). The individualreusecountforthisrecencyvalueisthengivenbythe Analysisofhashtagusagetypes.Inourdatasets,weanalyzehash- sizeofthesetofthesehashtagassignments. tagassignmentsaswellashashtagreusepracticeswiththeaimof Thetwoplotsshowsimilarresultsforbothdatasetsandindicate identifyingthedifferenttypesofhashtagusagesasaprerequisite that the more recently a hashtag ht was used by a user u in the forourrecommendationapproach. Specifically, foreachhashtag past,thehigheritsindividualreusecountis. Interestingly,thereis assignment, we study whether the corresponding hashtag has ei- aclearpeakafter24hoursinbothdatasets,whichfurtherindicates therbeenusedbythesameuserbefore(“individual”),bysomeof thatuserstypicallyusethesamesetofhashtagsinthistimespan herfollowees(“social”), byboth(“individual/social”), byanyone andthus,tendtotweetaboutsimilartopicsonadailybasis. Fur- elseinthedataset(“network”)orbyneitherofthem(“external”). thermore,wealsoobservehighR2valuesofnearly.9forthelinear TheresultsofthisstudyareshowninFigure1.Wefindthat66% fitsinthelog-logscaledplots,whichindicatesthatalargeamount ofhashtagassignmentsintheCompScidatasetand81%intheRan- ofourdatacanbeexplainedbyapowerfunction.Thisisalsosug- domdatasetcanbeexplainedbyindividualorsocialhashtagreuse. gestedbythepower-law-basedmodeloftheBLLequation[3,2]. This finding further corroborates our choice to utilize these two Incontrast,thelinearfitsinlog-linearscaledplotsonlyprovideR2 typesofinfluences(i.e.,individualandsocial)tocreateourmodel. valuesofapproximately.7,wherehighvalueswouldspeakinfavor Incontrasttotheselargenumbers,the6%to8%ofhashtagsinthe ofanexponentialfunction. “network”categoryisrelativelysmall.Interestingly,theamountof “external” hashtags is twice as high in the CompSci dataset (i.e., Temporal effects on social hashtag reuse. Plots (c) and (d) of 26%)asintheRandomone(i.e.,13%).Thus,inourdatasets,com- Figure2showtheeffectoftimeonthesocialhashtagreuseforthe puter scientists tend to use more hashtags, which have not been CompSciandRandomdatasets. Theseplotsarecreatedsimilarly previously introduced in the network, than random Twitter users. asplots(a)and(b)butthistime,weplotthesocialhashtagreuse Becauseofthis,webelievethattherecommendationaccuracyre- countoverthereuserecencyofahashtaghtbythefolloweesF of u sultswouldgenerallybelowerintheCompScidatasetthaninthe useru.Hence,foreachhashtagassignmentofhtbyu,wetakethe Randomone,whichwillbeevaluatedinSection5. Summingup, mostrecentusagetimestampofhtbyF . Thedifferencebetween u bothindividualandsocialhashtagshaveanimpactonusers’choice thistimestampandthetimestampofthecurrentlyanalyzedhashtag ofhashtagsforanewtweet. assignmentindicatesthetimesincethelastsocialexposureofht tou. Again,wesetthethresholdforthemaximumhashtagreuse 3. TEMPORALEFFECTSONHASHTAG recencytooneyear(i.e.,8,760hours). REUSEINTWITTER In these plots, we observe similar results for the two datasets since, in both cases, the more recently a user was exposed to a Inthissection,westudytowhatextenttemporaleffectsplaya hashtag,thehigheritssocialreusecountis. Furthermore,thereis roleinthereuseofindividualandsocialhashtagsinourtwodata- again(i)aclearpeakafter24hours,and(ii)theR2 valuesforthe sets(i.e.,CompSciandRandom). Specifically,weanalyzetherec- linearfitsinthelog-logscaledplots(i.e.,=.7)arelargerthaninthe encyofhashtagsassignments(i.e.,thetimesincethelasthashtag log-linearscaledplots(i.e.,=.4),whichspeaksinfavorofapower usage/exposure), aswellaswhetherthiseffectoftime-dependent function.Wenowstudyifthisisreallythecase. decayfollowsapower-laworexponentialdistribution. Dataset Parameter Individualhtreuse Socialhtreuse Scenario 1: Hashtag rec. w/o current tweet Hybrid combination xmin 141 1 User Hashtags of u Individual reuse CompSci α 1.699 1.242 u HTu BLLI Individual reuse R 188 164 BLL equation + social reuse Random xαmin 11.74213 1.2169 FollFowuees HashHtaTgFs uof Fu SocBiaLl LreSuse BLLI,S R 235 294 Scenario 2: Hashtag rec. w/ current tweet Individual reuse Table2: Power-lawvs. exponentialtime-dependentdecay. We Current tweet All tweets Content analysis + social reuse t T C + content analysis seethatapowerfunctionprovidesabetterfitthananexponen- TF-IDF BLLI,S,C tial function (R > 0) for explaining temporal effects on indi- Terms in t Similar tweets Hashtags of St vidualandsocialhashtagreuseinourtwodatasets(p<.001). Ct St HTSt Hybrid combination Power-lawvs. exponentialtime-dependentdecay. Thequestion Figure 3: Schematic illustration of our cognitive-inspired ap- whetherapoweroranexponentialfunctionisbettersuitedtomodel proachforhashtagrecommendations. Weimplementourap- thetime-dependentdecayofhashtagreuseisofinterestespecially proach in two scenarios (i.e., without and with incorporating forthedesignofourhashtagrecommendationapproachsinceboth thecontentofthecurrenttweet).InScenario1,weusetheBLL types of functions have been used in the area of time-aware rec- equation to realize (i) the individual BLLI algorithm, (ii) the ommendersystems. WhiletheBLLequationsuggeststheuseof socialBLLS algorithm,and(iii)thehybridBLLI,S algorithm, a power function to model the decay of item exposure in human whichcombinesboth.InScenario2,weuseTF-IDFtoidentify memory[3],relatedhashtagrecommenderapproaches,suchasthe similartweetsforacurrentlyproposedtweettandidentifythe oneproposedin[11],useanexponentialfunctionforthispurpose. hashtags of the most similar ones. We combine this content- Asalreadymentioned,thevisualinspectionofFigure2andtheR2 basedtweetanalysiswithourBLLI,S methodtoprovideper- valuesofthelinearfitsfavorapowerfunction. However, [5]has sonalizedandcontent-awarehashtagrecommendationsinthe shownthatthisleastsquares-basedmethodcanleadtomisinterpre- formofourhybridBLLI,S,C approach. tationsandthus,alikelihoodratio-basedtestissuggested. WeusethePythonimplementation[1]ofthemethoddescribed toforeseethetopicsaspecificuserwilltweetaboutbasedonthe in[5]tovalidateifapowerfunctionproducesabetterfitthanan predictedhashtags,whereasthesecondoneaimstosupportauser exponentialone. TheresultsofthistestareshowninTable2. The infindingthemostdescriptivehashtagsforanewtweettext[9]. mainvalueofinteresthereisthelog-likelihoodratioRbetweenthe For reasons of reproducibility, we implement and evaluate our twofunctions. Aswesee,R > 0inallfourcaseswithp < .001. approach by extending our open-source tag recommender bench- Thismeansthatthepowerfunctionindeedprovidesabetterfitthan marking framework TagRec. The source code and framework is the exponential function for explaining temporal effects on indi- freelyaccessibleforscientificpurposesontheWeb4. vidualandsocialhashtagreuse. Wealsoprovidethex andα min valuesofthefits.Inthisrespect,theαslopescanbeusedtosetthe 4.1 Scenario1: Hashtagrec. w/ocurrenttweet dparameteroftheBLLequation(i.e.,1.7intheindividualcaseand For the first variant of our approach, we ignore the content of 1.25inthesocialcase, seeSection4). Interestingly, thesevalues the current tweet t and solely utilize past hashtag usages. As al- are much higher than the suggested value of BLL’s d parameter, readystated,weusetheBLLequationcomingfromthecognitive whichis.5[2]. Webelievethatthisisthecasebecausetweeting architecture ACT-R [2, 3] for this task. We go for a cognitive- is more strongly influenced by temporal interest drifts than other inspiredapproach,sinceweknowfromresearchontheunderlying applicationsstudiedintheACT-Rcommunity(e.g.,[3]). mechanisms of social tagging that the way users choose tags for annotatingresources(e.g.,Weblinks)stronglycorrespondstopro- Finding1:Temporaleffectshaveanimportantinfluenceonboth cessesinhumanmemoryanditscognitivestructures[6,36]. The individualaswellassocialhashtagreuse: peopletendtoreuse BLLequationquantifiesthegeneralusefulnessofapieceofinfor- hashtags that were used very recently by their own and/or by mation(e.g.,awordorhashtag)byconsideringhowfrequentlyand theirTwitterfollowees. Furthermore,apowerfunctionisbetter recentlyitwasusedbyauserinthepast.Formally,itisgivenby: suited to model this time-dependent decay than an exponential one. ThissuggeststhattheBLLequationfromthecognitivear- n chitectureACT-Rshouldbeasuitablemodelfordesigningour B =ln((cid:88)t−d) (1) i j time-dependenthashtagrecommendationalgorithm. j=1 where B is the base-level activation of a memory unit i and n i 4. ACOGNITIVE-INSPIREDHASHTAG is the frequency of i’s occurrences in the past (i.e., how often i RECOMMENDATIONAPPROACH wasusedbyu). Furthermore,t statestherecency(i.e.,thetime j sincethejthoccurrenceofi)andtheexponentdaccountsforthe Intheprevioussection,wehaveshownthattemporaleffectsare power-lawoftime-dependentdecay. AsvisualizedinScenario1 importantfactorswhenusersreuseindividualandsocialhashtags. ofFigure3,weadopttheBLLequationfor(i)modelingthereuse Inthissection,weusetheseinsightsasabasistodesignourhashtag of individual hashtags (BLL ), (ii) modeling the reuse of social recommendationapproachillustratedinFigure3. Thus,wedistin- I hashtags(BLL ),and(iii)combiningtheformertwointoahybrid guishbetweenhashtagrecommendationswithout(Scenario1)and S recommendationapproach(BLL ). with(Scenario2)incorporatingthecurrenttweett. I,S Whereas the first variant of our approach solely uses the past Modelingindividualhashtagreuse.Inordertomodelthereuseof hashtags of a user u and/or her followees Fu, the second variant individualhashtags,wedefinetheindividualbase-levelactivation alsoutilizesthetextofthecurrenttweett. Hence,thesetwosce- nariosalsodifferintheirpossibleusecasessincethefirstoneaims 4https://github.com/learning-layers/TagRec B (ht,u)ofahashtaghtforauseruasfollows: Content-basedtweetanalysis. Weanalyzethecontentoftweets I inordertofindsimilartweetsforatargettweettandtoextractthe n B (ht,u)=ln((cid:88)(TS −TS )−dI) (2) hashtagsofthesesimilarones. Therefore,weincorporatetheterm I ref ht,u,j frequency-inverse document frequency (TF-IDF) statistic, which j=1 identifiestheimportanceofatermforadocumentinacollectionof wherendenotesthenumberoftimeshtwasusedbyuinthepast documents. TF-IDFcanbefurtherusedtocalculatethesimilarity (i.e.,|HTAS |)andthetermTS −TS statestherec- ht,u ref ht,u,j betweentwodocumentsdanddbysumminguptheTF-IDFstatis- ency of the jth usage of ht by u. In this respect, TS is the ref ticsofd’stermsind. WhenapplyingthisstatistictoTwitter, we referencetimestamp(i.e.,whenrecommendationsshouldbecalcu- treattweetsasdocumentsandcalculatethesimilaritybetweenthe lated)andTS isthetimestampwhenhtwasusedbyuforthe ht,u,j targettweettandacandidatetweettasfollows: jthtime.BasedontheresultsofouranalysispresentedinTable2, wesettheindividualtime-dependentdecayfactord to1.7. (cid:88) |T| I sim(t,t)= n ×log( ) (6) c,t |{t(cid:48) :c∈t(cid:48)}| Modeling social hashtag reuse. We model the reuse of social c∈Ct hashtagsinasimilarwaybutinsteadofanalyzinghowfrequently where C are the terms in the text of target tweet t, n is the andrecentlyahashtaghtwasusedbyuseru,weanalyzehowfre- t c,t numberoftimesc∈C occursinthecandidatetweett,|T|isthe quentlyandrecentlyhtwasusedbythesetoffolloweesF ofu. t u numberoftweetsinthedatasetand|{t(cid:48) :c∈t(cid:48)}|isthenumberof Thus,weformulatethesocialbase-levelactivationB (ht,u)ofht S timescoccursinanytweett(cid:48) ∈T.Thefirstfactorofthisequation foruasfollows: reflectsthetermfrequencyTF,whereasthesecondfactorreflects m B (ht,u)=ln((cid:88)(TS −TS )−dS) (3) theinversedocumentfrequencyIDF [45]. S ref ht,Fu,j Based on these similarity values, we identify the most similar j=1 tweetsS fortandextractthehashtagsusedinthesetweets(i.e., t wheremisthenumberoftimeshtwasusedbyFubeforetheref- HTSt). Foreachhashtaght ∈ HTSt,weassignacontent-based erencetimestampTSref (i.e.,|HTASht,Fu|).ThetermTSref − score CB(ht,t), which is the highest similarity value within the TSht,Fu,j statestherecencyofthejthexposureofhttoucaused most similar tweets St in which ht occurs. We implement this byFu,whereTSht,Fu,jisthetimestampwhenhtwasusedbyFu methodusingtheLucene-basedfull-textsearchengineApacheSolr forthejthtime. Aswhenmodelingtheindividualhashtagreuse, 4.7.105. BasedonSolr’ssoftwaredocumentationandourownex- wesetthesocialtime-dependentdecayfactordS basedonthere- perimentation,wesettheminimumtermfrequencytf to2andthe sultsofouranalysisinTable2(i.e.,to1.25). minimumdocumentfrequencydf to5. Combiningindividualandsocialhashtagreuse.Aswehavefor- Combining personalized and content-aware hashtag rec. We malizedtheindividualaswellassocialhashtagreuse,wewantto combineourpersonalizedBLL approachwiththiscontent-based I,S mixbothcomponentsinformofahybridapproachusingalinear analysis(C)inordertogeneratepersonalizedhashtagrecommen- combination [16]. Hence, in order to be able to add the individ- dations(seeFigure3). Again,weachievethisviaalinearcombi- ualandsocialbase-levelactivationsBI(ht,u)andBS(ht,u),we nationofbothapproaches.Takentogether,thetop-krecommended havetomapthesevaluesontoacommonrangeof0to1thatadd hashtagsH(cid:103)Tu,tforuseruandtweettaregivenby: upto1. Therefore,wedefinethesoftmaxfunctionsσ(B (ht,u)) I andσ(BS(ht,u))asproposedby[30,21].Thisisgivenby: H(cid:103)Tu,t = argkmax(λBI,S(ht,u)+(1−λ)σ(CB(ht,t))) (7) exp(B (t,u)) ht∈HTu,t (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) σ(BI(ht,u))= (cid:80) expI(B (ht(cid:48),u)) (4) BLLI,S C I ht(cid:48)∈HTu where HTu,t is the set of candidate hashtags for u and t (i.e., HT ∪HT ∪HT ). Theλparameterisusedtogiveweights whereHT isthesetofdistincthashtagsusedbyu.ForB (ht,u), u Fu St u S to the personalized and content-aware components. To that end, thesoftmaxfunctionσ(B (ht,u))canbecalculatedinthesame S we set λ to .3 based on experimentation. Please note that the waybutonthebasisofHT (i.e.,thesetofhashtagsusedbyu’s Fu content-basedscoreCB(ht,t)hastobenormalizedusingthesoft- followeesF ). Takentogether,thecombinedbase-levelactivation u maxfunction(seeEquation4),whereasB (ht,u)isalreadynor- B forourBLL approachisgivenby: I,S I,S I,S malized(seeEquation5). Thisfinallyconstitutesourpersonalized BI,S(ht,u)=βσ(BI(ht,u))+(1−β)σ(BS(ht,u)) (5) hashtagrecommendationalgorithmtermedBLLI,S,C. (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) BLLI BLLS 5. EVALUATION wheretheβparametercanbeusedtogiveweightstothetwocom- ponents.Basedonexperimentation,wesetβto.5toequallyweigh Inthissection,wepresenttheevaluationofourapproach. This theindividualandsocialinfluence.AsindicatedinEquation5and includes the methodology used as well as the results in terms of Figure3, wecanalsocalculatepredictionseithersolelybasedon recommendationaccuracyandrankingforourtwoscenarios. theindividualhashtagreuse,referredasBLL ,orthesocialhash- I 5.1 Methodology tagreuse,referredasBLL . S The methodology of our evaluation is given by the evaluation 4.2 Scenario2: Hashtagrec. w/currenttweet protocol,evaluationmetricsandbaselinealgorithmsused. AsshowninScenario2ofFigure3, thesecondvariantofour Evaluationprotocol.Inordertosplitourdatasetsintotrainingand approachaimstoprovidehashtagsuggestionswhilealsoincorpo- test sets, we use an established leave-one-out evaluation protocol ratingthecontentofthecurrentlyproposedtweett.Thus,webuild fromresearchoninformationretrievalandrecommendersystems ontheunpersonalizedmethodproposedby[45]tofindhashtagsof [16].Foreachseeduserinourdatasets(seeSection2)withatleast similartweetsandcombinethismethodwithourBLL approach I,S togeneratepersonalizedandcontent-awarerecommendations. 5http://lucene.apache.org/solr/ twotweets(i.e.,2,020usersintheCompScidatasetand2,679users standardFRimplementationprovidedbytheUniversityofKassel7 intheRandomdataset),wedeterminehermostrecenttweetandput withitssuggesteddefaultparameters.Morespecifically,theweight it(anditshashtags)intothetestset.Theremainingtweetsarethen ofthepreferencevectordissetto.7andthemaximumnumberof put into the training set. This protocol ensures not only that the iterationslissetto10[16]. hashtagsofatleastonetweetperuserareavailablefortrainingbut CF. User-based Collaborative Filtering is a well-known algo- alsothatthechronologicalorderofthedataispreserved(i.e.,future rithmusedinmanyvariantsofmodernrecommendersystemsand hashtagsarepredictedbasedonusagepatternsofpastones). We was adapted by [29] for use in tag-based settings. We apply the usethesesetsintwoevaluationscenarios: same idea for the task of recommending hashtags and thus, first Scenario 1. In the first scenario, we ignore the content of the identify the k most similar users (i.e., the nearest neighbors) for currently proposed tweet (i.e., the one in the test set) and solely currentuserubymeansofthecosinesimilaritymeasureandthen providehashtagpredictionsbasedonthecurrentuser-id. Thus,in suggestthehashtagsusedbytheseneighbors.Forourexperiments, Scenario1,weareabletoevaluatealltestsettweets. weuseaneighborhoodsizekof20users(seealso[8]). Scenario2.Inthesecondscenario,wealsoincorporatethecon- SR.SimilarityRank isanunpersonalizedhashtagrecommenda- tentofthecurrenttweet. Inthissetting,weonlyevaluatethetest tionalgorithm,whichutilizesthecontentofthecurrentlyproposed setentries,whichdonotincluderetweets(i.e.,954testsettweets tweett[45]. SimilarlytoourBLL approach,thisisachieved I,S,C in the CompSci dataset and 1,504 test set tweets in the Random usingTF-IDFtodeterminecontent-basedsimilarityscoresbetween dataset). The reason for excluding the retweets from the test set tweets(seeSection4.2). Thesescoresareusedtorecommendthe in Scenario 2 is that searching for similar tweets in the training khashtagsthatoccurint’smostsimilartweets. setwouldresultinidenticaltweetswithidenticalhashtags,which TCI.TemporalCombIntisoneofthemostrecentapproachesfor wouldheavilybiasourevaluation(seealso[45]). personalized hashtag recommendations and also one of the very fewapproachesthataccountsfortheeffectoftimeonhashtagus- Evaluation metrics. To finally quantify the quality of the algo- age[11](seealsoSection6). TCIbuildsonalinearcombination rithms,foreachtestsetentry,wecomparethetop-10hashtagsan ofSRandCFandincorporatestemporaleffectsbyconsideringthe algorithm predicts for the given user u and tweet t (i.e., H(cid:103)Tu,t) time-dependentrelevanceofahashtagwithrespecttotherecom- withthesetofrelevanthashtagsactuallyusedbyuint. mendationdate.Thisisdonebycategorizingthehashtagsinto“or- Thiscomparisonisdoneusingvariousevaluationmetricsknown ganizational”and“conversational”hashtags,andmodelingthede- from the field of recommender systems. Specifically, we report cayoftemporalrelevanceusinganexponentialfunction.Byfitting Precision (P) and Recall (R) for k = 1 to 10 predicted hashtags thismodeltoourcrawleddata,wesetthetwomainparametersof bymeansofPrecision/Recallplots,andF1-score(F1@5)fork = thealgorithm,η andη ,to.1and.2,respectively. l h 5 predicted hashtags. We set k = 5 for the F1-score since F1@5 was also used as the main evaluation metric in the well-known 5.2 ResultsandDiscussion ECMLPKDD2009discoverychallenge6. Additionally,wereport InSection3,wefoundthattimeisanimportantfactorforhash- theranking-dependentmetricsMeanReciprocalRank(MRR@10), tag reuse. Because of this, we assume that our time-dependent MeanAveragePrecision(MAP@10)andNormalizedDiscounted andcognitive-inspiredapproachshouldprovidereasonableresults CumulativeGain(nDCG@10)fork=10predictedhashtags[14]. comparedtootheralgorithms. Theaccuracyestimatesforourtwo Baselinealgorithms. Wecompareourapproachtoarichsetof9 evaluationscenariosareshowninTable3andFigure4. state-of-the-arthashtagrecommendationalgorithms: Scenario1: Hashtagrec. w/ocurrenttweet. Inourfirstevalu- MP . The Most Popular Individual Hashtags algorithm ranks I ationscenario,wevalidateapproachesthatpredictfuturehashtags thehashtagsbasedonthefrequencyinthehashtagassignmentsof withoutincorporatingthecontentofthecurrentlyproposedtweet. current user u. MP is also referred to as Most Popular Tags by I Here,weidentifythreemainresults: User(MP )intagrecommendationliterature[16]. u (a) BLL > MP , MR . When predicting individual hashtag MR . Most Recent Individual Hashtags is a time-dependent I I I I reuse,wecompareourBLL approachtothefrequency-basedMP variantofMP .MR suggeststhekmostrecentlyusedhashtagsof I I I I andtherecency-basedMR algorithms. Theresultsclearlyreflect currentuseru[4].OurBLL approachcanbeseenasanintegrated I I theimportanceofthetimecomponentsinceMR andBLL pro- combinationofMP andMR basedonhumanmemorytheory. I I I I vide higher prediction accuracy and ranking estimates than MP MP .TheMostPopularSocialHashtagsalgorithmisthesocial I S forallevaluationmetricsacrossbothdatasets.Apartfromthat,we correspondent to the individual MP approach [16]. Thus, MP I S observethatBLL outperformsMR ,whichspeaksinfavorofthe doesnotrankthehashtagsbasedonthefrequencyinthehashtag I I cognitive-inspired combination of hashtag frequency and recency assignments of user u but based on the frequency in the hashtag bymeansoftheBLLequation. assignmentsofuseru’ssetoffolloweesF . u (b) BLL > MP , MR . Concerning the prediction of social MR .MostRecentSocialHashtagsisthetime-dependentequiv- S S S S hashtagreuse,wecompareourBLL approachtothefrequency- alenttoMP .MR sortsthehashtagassignmentsofu’sfollowees S S S based MP and the recency-based MR methods. Similar to the F bytimeandrecommendsthek mostrecentones. OurBLL S S u S caseofindividualhashtagreuse,MR andourBLL-basedmethod algorithmisacognitive-inspiredintegrationofMP andMR . S S S providehigheraccuracyestimatesthanthesolelyfrequency-based MP. The unpersonalized Most Popular Hashtags approach re- one,butinterestingly,thistimethedifferencesbetweenthesemeth- turns the same set of hashtags for any user. These hashtags are odsismuchlarger.Thisindicatesthatthetimeinformationisespe- rankedbytheiroverallfrequencyinthedataset[16]. ciallyimportantinasocialsetting. Wesomehowexpectedthisbe- FR. FolkRank is an adaption of Google’s PageRank approach haviorsincetypicallyonlythemostrecenttweetsofthefollowees usedtoranktheentitiesinfolksonomygraphsandhasbecomeone are shown on a user’s Twitter timeline. Again, the combination ofthemostsuccessfultagrecommendermethods[12]. Weusethe ofhashtagfrequencyandrecencybymeansoftheBLLequation providesthebestresults. 6http://www.kde.cs.uni-kassel.de/ws/dc09/evaluation. 7http://www.kde.cs.uni-kassel.de/code Scenario1: Scenario2: Hashtagrec.w/ocurrenttweet Hashtagrec.w/currenttweet Dataset Metric MPI MRI BLLI MPS MRS BLLS MP FR CF BLLI,S SR TCI BLLI,S,C F1@5 .086 .098 .101 .022 .076 .118 .006 .083 .099 .153∗∗∗ .139 .182 .200∗ MRR@10 .136 .188 .193 .032 .122 .187 .007 .130 .163 .268∗∗∗ .264 .334 .395∗∗∗ CompSci MAP@10 .143 .195 .202 .033 .128 .205 .007 .136 .169 .285∗∗∗ .283 .354 .417∗∗∗ nDCG@10 .175 .218 .225 .046 .154 .235 .012 .169 .196 .324∗∗∗ .299 .385 .446∗∗ F1@5 .160 .169 .175 .072 .103 .138 .012 .159 .165 .208∗∗∗ .181 .243 .261∗ MRR@10 .261 .300 .314 .109 .159 .220 .023 .260 .278 .361∗∗∗ .341 .436 .489∗∗ Random MAP@10 .279 .315 .335 .116 .171 .240 .024 .279 .296 .389∗∗∗ .374 .472 .530∗∗ nDCG@10 .323 .352 .370 .144 .205 .280 .035 .324 .333 .434∗∗∗ .388 .507 .562∗∗ Table 3: Recommender accuracy results of our two evaluation scenarios. In Scenario 1, we compare approaches that ignore the currenttweetcontent,whileinScenario2,wecomparealgorithmsthatalsoincorporatethecurrenttweet.Weobservethat(i)BLL I outperformsMP andMR ,(ii)BLL outperformsMP andMR ,(iii)BLL outperformsMP,FRandCF,and(iv)BLL I I S S S I,S I,S,C outperformsSRandTCI.Basedonat-test,thesymbols∗ (α=.1),∗∗ (α=.01)and∗∗∗ (α=.001)indicatestatisticallysignificant differencesbetweenBLL andCFinScenario1,andbetweenBLL andTCIinScenario2. I,S I,S,C 0.25 0.40 0.40 0.6 BLLI BLLI SR SR BLLS 0.35 BLLS 0.35 TCI 0.5 TCI 0.20 CF CF 0.30 BLLI,S,C BLLI,S,C BLLI,S 0.30 BLLI,S 0.4 Precision00..1105 Precision00..2205 Precision000...122505 Precision0.3 0.2 0.15 0.10 0.05 0.10 0.05 0.1 0.000.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.005.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.000.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.00.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 Recall Recall Recall Recall (a)Scenario1:Hashtagrec.w/o (b)Scenario1:Hashtagrec.w/o (c)Scenario2:Hashtagrec.w/ (d)Scenario2:Hashtagrec.w/ currenttweet currenttweet currenttweet currenttweet CompScidataset Randomdataset CompScidataset Randomdataset Figure4: Precision/RecallplotsofourtwoevaluationscenariosshowingtheaccuracyofBLL ,BLL ,CF,BLL ,SR,TCIand I S I,S BLL fork=1-10recommendedhashtags.Again,BLL providesthebestresultsinScenario1andBLL inScenario2. I,S,C I,S I,S,C (c) BLL > MP, FR, CF. Finally, we compare our hybrid CFforaddingpersonalization,weincorporatenotonlyindividual I,S BLL approach to the unpersonalized MP algorithm, the well- hashtags of the current user but also social hashtags of the cur- I,S knownFRmethodfromtagrecommenderresearchandclassicuser- rentuser’sfollowees,(ii)insteadofapplyingtheeffectoftimeon basedCF.Thefirstobservationthatbecomesapparentisthepoor aglobalhashtaglevel,wemodelthetime-dependentdecayonan performanceoftheunpersonalizedMPbaseline,whichunderpins individualandsociallevel,and(iii)insteadofmodelingthistime- theimportanceofpersonalizedmethodsforhashtagrecommenda- dependent decay using an exponential function, we use a power tion. Additionally, andmoreimportantly, ourhybridBLL ap- functionbymeansoftheBLLequation. I,S proachdoesnotonlyimproveitsBLL andBLL componentsbut I S CompScidatasetvs. Randomdataset. Anotherinterestingfind- also provides significantly higher accuracy and ranking estimates ingweobserveisthatallalgorithmsprovidebetterresultsforthe thanFRandCF.ThisshowsthatBLL iscapableofproviding I,S RandomdatasetthanfortheCompScidataset. Inourcase,thisin- reasonablehashtagrecommendationssolelybasedontemporalus- dicatesthatthetaskofpredictinghashtagsinthedomain-specific agepatternsofpasthashtagassignments. networkofcomputerscientistsisharderthaninthenetworkofran- Scenario2: Hashtagrec. w/currenttweet. Inthesecondsce- dom users. If we look back at Figure 1, this makes sense since nario,weevaluatehashtagrecommendationmethodsthatalsoin- theamountof“external”hashtagsistwiceashighintheCompSci corporatethecontentofthecurrenttweet.Thisincludestheunper- dataset(i.e.,26%)thanintheRandomone(i.e.,13%). sonalizedSRapproach,thetime-dependentTCIalgorithmandour BLL approach.Ourtwomainresultsare: Finding2: TheBLLequation,whichaccountsfortemporalef- I,S,C (a) TCI, BLL > SR. The first main result of our second fects of item exposure in human memory, provides a suitable I,S,C evaluationscenarioisthatbothtime-dependentmethodsTCIand model for personalized hashtag recommendations. This is val- BLL outperformtheunpersonalizedSRapproach. Wesome- idatedintwoevaluationscenarios(i.e.,withoutandwithincor- I,S,C howexpectedthisresultsincebothTCIandBLL extendthe poratingthecontentofthecurrenttweet),inwhichourcognitive- I,S,C TF-IDF-based tweet content analysis of SR with personalization inspired approach outperforms several state-of-the-art hashtag techniquesviaCF(TCI)ortheBLLequation(BLL ). recommendationalgorithmsintermsofpredictionaccuracy. I,S,C (b) BLL > TCI. The second main result of Scenario 2 I,S,C isthatBLL providessignificantlyhigheraccuracyestimates I,S,C 6. RELATEDWORK thanTCI.Thisisduetothreemaindifferencesbetweenthesemeth- ods: (i) instead of using hashtags of similar users by means of Overthepastyears,tagginghasemergedasanimportantfeature ofthesocialWeb,whichsupportsuserstocollaborativelyorganize and find content [18]. Two types of tags have been established: thehashtagsintotwocategories: “organizational”ones,whichare (i)socialtagsasusedinsystemslikeBibSonomyandCiteUlike, usedoveralongperiodoftimeand“conversational”ones,which and (ii) hashtags as used in systems like Twitter and Instagram. areusedonlyduringashorttimespan(e.g.,foraspecificevent). Whereas social tags are mainly used to index resources for later Incontrasttoourproposedalgorithm,whichreliesontheBLL retrieval,hashtagshaveamoreconversationalnatureandareused equation, their approach considers the effect of time on a global tofilteranddirectcontenttocertainstreamsofinformation[13]. hashtag level of the whole Twitter network and not on an indi- Oneofthemostprominentapproachesinthefieldoftagrecom- vidual and social level of a specific user. Furthermore, we use a mendationsistheFolkRankalgorithm[12,15,16].FolkRankisan powerfunctionratherthananexponentialonetomodelthetime- extension of the well-known Google PageRank approach to rank dependentdecaybasedonourempiricalfindings. theentitiesinfolksonomies(i.e.,users,resourcesandtags). Other importanttagrecommendationmethodsarebasedonCollaborative 7. CONCLUSIONANDFUTUREWORK Filtering[29,8],LatentDirichletAllocation[23,22]orTensorFac- torization[33,32]. Recentobservationsinthefieldofsocialtag- In this paper, we presented a cognitive-inspired approach for gingstatetheimportanceofthetimecomponentfortheindividual hashtag recommendations in Twitter. Our approach utilizes the taggingbehaviorofusers.Inthisrespect,[47,43,44]proposetime- BLL equation from the cognitive architecture ACT-R to account dependenttagrecommenderapproaches,whichmodelthetagging fortemporaleffectsonindividualhashtagreuse(i.e.,reusingown variation over time using exponential functions. In our previous hashtags) and social hashtag reuse (i.e., reusing hashtags, which work[21,20],wepresentedamoretheory-drivenapproach,where hasbeenpreviouslyusedbyafollowee). Ouranalysisofhashtag we use the BLL equation coming from the cognitive architecture usagetypesintwoempiricalnetworks(i.e.,CompSciandRandom ACT-R[3,2]tomodelthepower-lawoftime-dependentdecay.We datasets)crawledfromTwitterrevealsthatbetween66%and81% evaluatedourapproachindetailandcomparedittootherstate-of- ofhashtagassignmentscanbeexplainedbypastindividualandso- the-art methods in [19]. In the present work, we build upon our cialhashtagusage. Byanalyzingthetimestampsofthesehashtag resultsandincorporatetheBLLequationtostudytheeffectoftime assignments, wefindthattemporaleffectsplayanimportantrole onhashtagreusetodesignourhashtagrecommendationapproach. forbothindividualandsocialreuseofhashtagsandthatapower Thereisalreadyalargebodyofresearchavailablethatfocuses function provides a better fit to model this time-dependent decay ontherecommendationofhashtagsinTwitter. Oneillustrativeex- thananexponentialfunction. ampleistheworkpresentedin[9],inwhichhashtagrecommenda- Thus,themorerecentlyahashtagwasusedbyauserorherfol- tionsareprovidedbycategorizingtweetsintogeneraltopicsusing lowees,thehighertheprobabilitythatthisuserwillusethesame LDA.Theapproachthenrecommendsthehashtagsthatbestfitthe hashtagagainlaterintime.Basedonthesefindings,weutilizedthe topicsofanewtweet. Theauthorsevaluatetheirapproachusing Base-LevelLearning(BLL)equationofthecognitivearchitecture aqualitativestudy,inwhichtheyaskpersonsiftherecommended ACT-R,whichaccountsforthetime-dependentdecayofitemex- hashtagsdescribethetopicsofatweetandcouldbeusedtoseman- posureinhumanmemory,todevelopBLLI,S andBLLI,S,C,two tically enrich it. In 80% of the cases, they are able to provide a algorithmsforrecommendinghashtags. WhereasBLLI,S aimsto suitablehashtagfromaselectionoffivepossibilities.Othersimilar recommendhashtagswithoutincorporatingthecurrenttweet(Sce- approachesthatusetopicmodelsforhashtagrecommendationsare nario 1), BLLI,S,C also utilizes the content of the current tweet presentedin[37,40,41,7].In[17],arelatedalgorithmbasedona usingtheTF-IDFstatistic(Scenario2). Wecomparedbothalgo- hashtagclassificationschemeisproposed. Themostnotablework rithmstostate-of-the-arthashtagrecommendationalgorithmsand inthecontextofhashtagrecommendationsisprobablythecontent- foundthatourcognitive-inspiredapproachesoutperformtheseal- based SR approach presented in [45] and [46]. The authors use gorithmsintermsofpredictionaccuracyandranking. the TF-IDF statistic to calculate similarities between tweets and Onelimitationofthisworkisthatwemodelthereuseofsocial identify suitable hashtags based on these similarity scores. They hashtagssolelybyanalyzinghowfrequentlyandrecentlyahashtag showthatSRimprovesRecallandPrecisionbyaround35%com- was used by a user’s followees, neglecting by whom the hashtag paredtoapopularity-basedapproach.OurBLL approachuses wasused. Thus,forfuturework,weplantoextendourapproach I,S,C thesamestatistictointegratethecontentofauser’scurrentlypro- withthesocialstatusofthefollowee(e.g.,viathereputationofthe posedtweet.In[25],apersonalizedextensionofSRispresented,in userbymeansofthenumberoffollowers).Inthisrespect,wewill whichtheauthorscombineitwithuser-basedCF.Apartfromthat,a alsoutilizethesocialconnectionstrengthbetweenauserandher content-basedhashtagrecommendationalgorithmforhyper-linked followee(e.g.,bythenumberofmentionsorretweets). tweetsisproposedin[35]. Withrespecttothehashtagassignmentsthatcannotbeexplained Relatedresearchhasstudiedtemporaleffectsonhashtagusage, byhashtagreuse(i.e.,26%intheCompScidatasetand13%inthe forinstanceinthecontextofpopularhashtagsinTwitter[27,26, Randomdataset),wewanttoutilizeanexternalknowledgebaseto 39,28]. Forexample,in[28],theauthorsaimtopredictifaspe- alsoaccountforthesehashtagassignments.Wewillachievethisby cifichashtagwillbepopularonthenextday. Byformulatingthis suggestinghashtagsofcurrentlytrendingtopicsorevents. Finally, taskasaclassificationproblem,theyfindthatbothcontentfeatures wealsoplantoverifyourfindingsinlargerTwitterdatasamples (e.g.,thetopicofthehashtag)andcontextfeatures(e.g.,theusers than the ones used in this paper as well as in other online social whousedthehashtags)areeffectivefeaturesforpopularitypredic- networksthatfeaturehashtags,suchasInstagramandFacebook. tion. Asimilarapproachispresentedin[42],inwhichtheauthors Insummary,ourworkcontributestotherichlineofresearchon uncoverthetemporaldynamicsofonlinecontent(e.g.,tweets)by improving the use of hashtags in social networks. We hope that formulatingatimeseriesclusteringproblem. Oneoftheveryfew futureworkwillbeattractedbyourinsightsintohowtemporalef- examplesofatime-awarehashtagrecommendationapproachisthe fectsonhashtagusagecanbemodeledusingmodelsfromhuman recentlyproposedalgorithmdescribedin[11]. Theauthorsextend memorytheory,suchastheBLLequation. the content-based SR approach [45] with a personalization tech- Acknowledgments.TheauthorswouldliketothankMatthiasTraub niquebymeansofCFandfurtherconsiderthetemporalrelevance andDieterTheilerforvaluableinputs. Thisworkisfundedbythe of hashtags. To account for this temporal relevance, they divide Know-CenterandtheEUprojectAFEL(GA:687916). 8. REFERENCES [24] H.Kwak,C.Lee,H.Park,andS.Moon.Whatistwitter,asocial networkoranewsmedia?InProc.ofWWW’10,pages591–600, [1] J.Alstott,E.Bullmore,andD.Plenz.powerlaw:apythonpackage NewYork,NY,USA,2010.ACM. foranalysisofheavy-taileddistributions.PloSone,9(1):e85777, 2014. [25] S.M.Kywe,T.-A.Hoang,E.-P.Lim,andF.Zhu.Onrecommending hashtagsintwitternetworks.InSocialInformatics,pages337–350. [2] J.R.Anderson,D.Bothell,M.D.Byrne,S.Douglass,C.Lebiere, Springer,2012. andY.Qin.Anintegratedtheoryofthemind.Psychologicalreview, 111(4):1036,2004. [26] J.Lehmann,B.Gonçalves,J.J.Ramasco,andC.Cattuto.Dynamical classesofcollectiveattentionintwitter.InProc.ofWWW’12,pages [3] J.R.AndersonandL.J.Schooler.Reflectionsoftheenvironmentin 251–260.ACM,2012. memory.PsychologicalScience,2(6):396–408,1991. [27] J.LinandG.Mishne.Astudyof“churn”intweetsandreal-time [4] P.G.Campos,F.Díez,andI.Cantador.Time-awarerecommender searchqueries.InProc.ofICWSM’12,2012. systems:acomprehensivesurveyandanalysisofexistingevaluation protocols.UserModelingandUser-AdaptedInteraction, [28] Z.Ma,A.Sun,andG.Cong.Willthis#hashtagbepopular 24(1-2):67–119,2014. tomorrow?InProc.ofSIGIR’12,pages1173–1174.ACM,2012. [5] A.Clauset,C.R.Shalizi,andM.E.Newman.Power-law [29] L.B.MarinhoandL.Schmidt-Thieme.Collaborativetag distributionsinempiricaldata.SIAMreview(SIREV), recommendations.InDataAnalysis,MachineLearningand 51(4):661–703,2009. Applications,pages533–540.Springer,2008. [6] U.Cress,C.Held,andJ.Kimmerle.Thecollectiveknowledgeof [30] J.McAuleyandJ.Leskovec.Hiddenfactorsandhiddentopics: socialtags:Directandindirectinfluencesonnavigation,learning, understandingratingdimensionswithreviewtext.InProc.of andinformationprocessing.Computers&Education,60(1):59–73, RecSys’13,pages165–172.ACM,2013. 2013. [31] S.A.MyersandJ.Leskovec.Theburstydynamicsofthetwitter [7] M.Efron.Hashtagretrievalinamicrobloggingenvironment.InProc. informationnetwork.InProc.ofWWW’14,pages913–924,New ofSIGIR’10,pages787–788.ACM,2010. York,NY,USA,2014.ACM. [8] J.Gemmell,T.Schimoler,M.Ramezani,L.Christiansen,and [32] S.Rendle,L.BalbyMarinho,A.Nanopoulos,and B.Mobasher.Improvingfolkrankwithitem-basedcollaborative L.Schmidt-Thieme.Learningoptimalrankingwithtensor filtering.RecommenderSystems&theSocialWeb,2009. factorizationfortagrecommendation.InProc.ofKDD’09,pages 727–736.ACM,2009. [9] F.Godin,V.Slavkovikj,W.DeNeve,B.Schrauwen,andR.Vande Walle.Usingtopicmodelsfortwitterhashtagrecommendation.In [33] S.RendleandL.Schmidt-Thieme.Pairwiseinteractiontensor Proc.ofWWW’13companion,pages593–596,RepublicandCanton factorizationforpersonalizedtagrecommendation.InProc.of ofGeneva,Switzerland,2013.ACM. WSDM’10,pages81–90.ACM,2010. [10] A.T.HadguandR.Jäschke.Identifyingandanalyzingresearchers [34] D.M.Romero,B.Meeder,andJ.Kleinberg.Differencesinthe ontwitter.InProc.ofWebSci’14,pages23–30,NewYork,NY, mechanicsofinformationdiffusionacrosstopics:Idioms,political USA,2014.ACM. hashtags,andcomplexcontagionontwitter.InProc.ofWWW’11, pages695–704,NewYork,NY,USA,2011.ACM. [11] M.HarveyandF.Crestani.Longtime,notweets!time-aware personalisedhashtagsuggestion.InProc.ofECIR’15,pages [35] S.SedhaiandA.Sun.Hashtagrecommendationforhyperlinked 581–592.Springer,2015. tweets.InProc.ofSIGIR’14,pages831–834.ACM,2014. [12] A.Hotho,R.Jäschke,C.Schmitz,G.Stumme,andK.-D.Althoff. [36] P.Seitlinger,T.Ley,andD.Albert.Verbatimandsemanticimitation Folkrank:Arankingalgorithmforfolksonomies.InProc.ofLWA’06, inindexingresourcesontheweb:Afuzzy-traceaccountofsocial volume1,pages111–114,2006. tagging.AppliedCognitivePsychology,29(1):32–48,2015. [13] J.Huang,K.M.Thornton,andE.N.Efthimiadis.Conversational [37] J.SheandL.Chen.Tomoha:Topicmodel-basedhashtag taggingintwitter.InProc.ofHT’10.ACM,2010. recommendationontwitter.InProc.ofWWW’14companion,pages 371–372,RepublicandCantonofGeneva,Switzerland,2014.ACM. [14] K.JärvelinandJ.Kekäläinen.Irevaluationmethodsforretrieving highlyrelevantdocuments.InProc.ofSIGIR’00,pages41–48,New [38] T.A.Small.Whatthehashtag?acontentanalysisofcanadian York,NY,USA,2000.ACM. politicsontwitter.Information,Communication&Society, 14(6):872–895,2011. [15] R.Jäschke,L.Marinho,A.Hotho,L.Schmidt-Thieme,and G.Stumme.Tagrecommendationsinfolksonomies.InProc.of [39] O.TsurandA.Rappoport.What’sinahashtag?:Contentbased PKDD’07,pages506–514.Springer,2007. predictionofthespreadofideasinmicrobloggingcommunities.In Proc.ofWSDM’12,pages643–652,NewYork,NY,USA,2012. [16] R.Jäschke,L.Marinho,A.Hotho,L.Schmidt-Thieme,and ACM. G.Stumme.Tagrecommendationsinsocialbookmarkingsystems. AICommunications,21(4):231–247,2008. [40] Y.Wang,J.Qu,J.Liu,J.Chen,andY.Huang.Whattotagyour microblog:Hashtagrecommendationbasedontopicanalysisand [17] M.Jeon,S.Jun,andE.Hwang.Hashtagrecommendationbasedon collaborativefiltering.InWebTechnologiesandApplications,pages usertweetandhashtagclassificationontwitter.InProc.ofWAIM’14, 610–618.Springer,2014. pages325–336.Springer,2014. [41] J.Xu,Q.Zhang,andX.Huang.Personalizedhashtagsuggestionfor [18] C.Körner,D.Benz,A.Hotho,M.Strohmaier,andG.Stumme.Stop microblogs.InProc.ofSMP’15,pages38–50.Springer,2015. thinking,starttagging:Tagsemanticsemergefromcollaborative verbosity.InProc.ofWWW’10,pages521–530,NewYork,NY, [42] J.YangandJ.Leskovec.Patternsoftemporalvariationinonline USA,2010.ACM. media.InProc.ofWSDM’11,pages177–186.ACM,2011. [19] D.KowaldandE.Lex.Evaluatingtagrecommenderalgorithmsin [43] D.Yin,L.Hong,andB.D.Davison.Exploitingsession-like real-worldfolksonomies:Acomparativestudy.InProc.of behaviorsintagprediction.InProc.ofWWW’11companion,pages RecSys’15,pages265–268,NewYork,NY,USA,2015.ACM. 167–168.ACM,2011. [20] D.KowaldandE.Lex.Theinfluenceoffrequency,recencyand [44] D.Yin,L.Hong,Z.Xue,andB.D.Davison.Temporaldynamicsof semanticcontextonthereuseoftagsinsocialtaggingsystems.In userinterestsintaggingsystems.InProc.ofAAAI’11.AAAI,2011. Proc.ofHT’16,pages237–242.ACM,2016. [45] E.Zangerle,W.Gassler,andG.Specht.Recommending#-tagsin [21] D.Kowald,P.Seitlinger,C.Trattner,andT.Ley.Longtimenosee: twitter.InProc.ofSASWeb’11,pages67–78.CEURWorkshop Theprobabilityofreusingtagsasafunctionoffrequencyand Proceedings,2011. recency.InProc.ofWWW’14companion,pages463–468.ACM, [46] E.Zangerle,W.Gassler,andG.Specht.Ontheimpactoftext 2014. similarityfunctionsonhashtagrecommendationsinmicroblogging [22] R.KrestelandP.Fankhauser.Personalizedtopic-basedtag environments.SocialNetworkAnalysisandMining,3(4):889–898, recommendation.Neurocomputing,76(1):61–70,2012. 2013. [23] R.Krestel,P.Fankhauser,andW.Nejdl.Latentdirichletallocation [47] L.Zhang,J.Tang,andM.Zhang.Integratingtemporalusagepattern fortagrecommendation.InProc.ofRecSys’09,pages61–68.ACM, intopersonalizedtagprediction.InWebTechnologiesand 2009. Applications.Springer,2012.