Marked Temporal Dynamics Modeling based on Recurrent Neural Network YongqingWang,ShenghuaLiu,HuaweiShen,XueqiCheng CASKeyLaboratoryofNetworkDataScienceandTechnology, InstituteofComputingTechnology,ChineseAcademyofSciences,Beijing100190,China 7 Abstract. We are now witnessing the increasing availability of event stream 1 data, i.e., a sequence of events with each event typically being denoted by the 0 timeitoccursanditsmarkinformation(e.g.,eventtype).Afundamental prob- 2 lemistomodelandpredictsuchkindofmarkedtemporaldynamics,i.e.,when n the next event will take place and what its mark will be. Existing methods ei- a therpredictonlythemarkorthetimeofthenextevent,orpredictbothofthem, J yet separately.Indeed, inmarked temporaldynamics, thetimeandthemarkof 4 thenexteventarehighlydependentoneachother,requiringamethodthatcould 1 simultaneously predict both of them. To tackle this problem, in this paper, we proposetomodelmarkedtemporaldynamicsbyusingamark-specificintensity ] functiontoexplicitlycapturethedependencybetweenthemarkandthetimeof G thenextevent.Extensiveexperimentsontwodatasetsdemonstratethatthepro- L posedmethodoutperformsstate-of-the-artmethodsatpredictingmarkedtempo- . raldynamics. s c [ Keywords: markedtemporaldynamics,recurrentneuralnetwork,eventstream data 1 v 8 1 1 Introduction 9 3 Thereisanincreasingamountofeventstreamdata,i.e.asequenceofeventswitheach 0 event being denoted by the time it occurs and its mark information(e.g. event type). . 1 Markedtemporaldynamicsoffersusawaytodescribethisdataandpotentiallypredict 0 events.Forexample,inmicrobloggingplatforms,markedtemporaldynamicscouldbe 7 used to characterize a user’s sequence of tweets containing the posting time and the 1 : topicasmark[8];inlocationbasedsocialnetworks,thetrajectoryofausergivesriseto v amarkedtemporaldynamics,reflectingthetimeandthelocationofeachcheck-in[15]; i X in stock market, marked temporal dynamics corresponds to a sequence of investors’ r tradingbehaviors,i.e., biddingorasking orders,with the typeof tradingas mark[3]; a An ability to predict marked temporaldynamics, i.e., predicting when the next event will take place and what its mark will be, is not only fundamental to understanding theregularityorpatternsoftheseunderlyingcomplexsystems,butalsohasimportant implicationsinawiderangeofapplications,fromviralmarketingandtrafficcontrolto riskmanagementandpolicymaking. Existingmethodsforthisproblemfallintothreemainparadigms,eachwithdiffer- entassumptionsandlimitations.Thefirstcategoryofmethodsfocusesonpredictingthe 2 YongqingWang,ShenghuaLiu,HuaweiShen,XueqiCheng markofthenextevent,formulatingtheproblemasadiscrete-timeorcontinuous-time sequencepredictiontask[12,24].Thesemethodsgainedsuccessatmodelingthetran- sition probabilityacross marks of events. However,they lack the power at predicting whenthenexteventwilloccur. Thesecondcategoryofmethods,oncontrary,aimstopredictwhenthenextevent willoccur[9].Thesemethodseitherexploittemporalcorrelationsforprediction[20,23] orconductpredictionbymodelingthetemporaldynamicsusingcertaintemporalpro- cess, such asself-excitingHawkesprocess[5,1], variousPoisson process[22,8] , and other auto-regressive processes [7,16]. These methods have been successful used in modelingandpredictingtemporaldynamics.However,thesemodelsareunabletopre- dictthemark. Besides the above two categories of methods, researchers recently attempt to di- rectly model the marked temporal dynamics [10]. A recent work [6] used recurrent neural network to automatically learn history embedding, and then predict both, yet separately,the time and the mark of the nextevent. This work assumesthat time and markareindependentoneachothergiventhehistoricalinformation.Yet,suchassump- tionfailstocapturethedependencybetweenthetimeandthemarkofthenextevent.For example,whenyouhavelunchisaffectedbyyourchoiceonrestaurants,sincedifferent restaurants imply difference in geographic distance and quality of service. The sepa- rated prediction by maximizing the probability on mark and time does not imply the mostlikelyevent.Insum,westilllackamodelthatcouldconsidertheinterdependency ofmarkandtimewhenpredictingthenextevent. Inthispaper,weproposeanovelmodelbasedonrecurrentneuralnetwork(RNN), namedRNN-TD,to capturethedependencebetweenthemarkofaneventanditsoc- curring time. The key idea is to use a mark-specific intensity function to model the occurringtimeforeventswith differentmarks.Besides, RNN canhelpto relievesthe disscussion of the explicit dependency structure among historical events, which em- bedssequentialcharacteristics.Thebenefitsofourproposedmodelarethree-fold:1)It modelsthemarkandthetimeofthenexteventsimultaneously;2)Themark-specificin- tensityfunctionexplicitlycapturesthedependencybetweentheoccurringtimeandthe markofanevent;3)TheinvolvementofRNNsimplifiesthemodelingofdepenedency onhistoricalevents. Weevaluatetheproposedmodelbyextensiveexperimentsonlarge-scalerealworld datasets from Memetracker1 and Dianping2. Compared with several state-of-the-art methods,RNN-TDoutperformsthematpredictionofmarksandtimes.Wealsoconduct casestudytoexplorethecapabilityofeventpredictioninRNN-TD.Theexperimental resultsprovethatitcanbettermodelmarkedtemporaldynamics. 2 Proposed Model Inthispaper,wefocusontheproblemofmodelingmarkedtemporaldynamics.Before diving into the details of the proposed model, we first clarify two main motivations underlyingourmodel. 1http://www.memetracker.org 2http://www.dianping.com MarkedTemporalDynamicsModelingbasedonRecurrentNeuralNetwork 3 Time interval distribution 100 (leaving from mark #6) on dianping Whh h (cid:127)(ti) Wht i-1W (cid:127)h s(•) ti+1 CDF activation hi etaxrpgeect ttaot imonark #1 (cid:127)(ei) Whe W(cid:127)h r(•) ei+1 target to mark #2 10-1 ttaarrggeett ttoo mmaarrkk ##69 target to mark #13 101 Time1 i0n2terval 103 operation vector transfer copy concatenate (a) (b) Fig.1.(a)Highvarianceexistedintimeintervaldistributionwhentargetingtodifferentmarks. (b) The architecture of RNN-TD.Given the event sequence S = {(ti,ei)}i=1, the i-th event (ti,ei) is mapped through function φ(t) and ϕ(e) into vector spaces as inputs in RNN. Then theinputsφ(ti)andϕ(ei)associatedwiththelastembeddinghi−1 arefedintohiddenunitsin ordertoupdatehi.Dependentonembeddinghi,RNN-TDoutputsthenexteventtypeei+1and correspondinttimeti+1. 2.1 Motivation Inrealscenarios,markandtime ofnexteventarehighlydependentoneachother.To validate this point, we focus on a practical case in Dianping.We extract the trajecto- riesstartingfromthesamelocation(mark#6)andgetthestatisticresultstoexamineif thetimeintervalbetweentwoconsecutiveeventsarediscriminativetoeachotherwith respecttodifferentmarks.Thestatisticsoftimeintervaldistributionarerepresentedin Fig.1(a).Wecanobservethatlargevarianceexistsinthedistributionswhenconsumers makedifferentchoices.Itmotivatesustomodelmark-specifictemporaldynamics. Second, existing works [12,24] attempted to formulate marked temporal dynam- icsbyMarkovrandomprocesseswithvaryingorders.However,thegenerationofnext eventrequiresstrongpriorknowledgeondependencyofhistory.Besides,longdepen- dencyonhistorycausesstate-spaceexplosionprobleminpractice.Therefore,wepro- pose a RNN-based model which learns the dependencyby deep structure. It embeds historyinformationintovectorizedrepresentationwhenmodelingsequences.Thegen- erationofnexteventisonlydependentonhistoryembedding. 2.2 ProblemFormulation AneventsequenceS = {(ti,ei)}i=1 isasetofeventsinascendingorderoftime.The tuple (t ,e ) records the i-th event in the sequence S, and the variables t ∈ T and i i i e ∈ E denotethe time and the mark respectively,where E is a countablestate space i includingall possiblemarksandT ∈ R+ is the time spacein whichobservedmarks takeplace.Wecouldhavevariousinstantiationindifferentapplications. 4 YongqingWang,ShenghuaLiu,HuaweiShen,XueqiCheng Thenthe likelihoodofobservedsequenceS can beunfoldedby chainrule asfol- lows, |S| P(S)= p(t ,e |H ), i i ti iY=1 whereH = {(t ,e )|t < t ,e ∈ E} referstoallrelatedhistoricaleventsoccurring ti l l l i l beforet .Inpractice,thejointprobabilityofapairofmarkandtimecanbewrittenby i Bayesianruleasfollows p(t ,e |H )=r(e |H )s(t |e ,H ), (1) i i ti i ti i i ti wherer(e |H )isthetransitionprobabilityrelatedtoe ands(t |e ,H )istheproba- i ti i i i ti bilitydistributionfunctionoftimegivenaspecificmark. Then we propose a general model to parameterize r(e |H ) and s(t |e ,H ) in i ti i i ti markedtemporaldynamicsmodeling,namedRNN-TD.Recurrentneuralnetwork(RNN) isafeed-forwardneuralnetworkformodelingsequentialdata.InRNN,thecurrentin- putsarefedintohiddenunitsbynonlineartransformation,jointlywiththeoutputsfrom theprevioushiddenunits.Thefeed-forwardarchitectureisreplicativeinbothinputsand outputssothattherepresentationofhiddenunitsisdependentonnotonlycurrentinputs butalsoencodedhistoricialinformation.Theadaptivesizeofhiddenunitsandnonlin- ear activation function (e.g., sigmoid, tangent hyperbolic or rectifier function) make neuralnetworkcapableofapproximatingarbitrarycomplexfunctioninhugefunction space[2]. ThearchitectureofRNN-TDisdepictedinFig.1(b).Theinputsofaevent(t ,e ) i i is vectorizedbymappingfunctionφ(·) andϕ(·). Thenthe i-thinputsassociatedwith thelastembeddingh arefedintohiddenunitsinordertoupdateh .Giventhei-th i−1 i event(ti,ei),theembeddinghi−1 andmappingfunctinφandϕ,therepresentationof hiddenunitsinRNN-TDcanbecalculatedas hi =σ Whtφ(ti)+Wheϕ(ei)+Whhhi−1 , (2) (cid:0) (cid:1) where σ is the activation function, and Wht, Whe and Whh are weight matrices in neuralnetwork.Theprocedureisiterativelyexecuteduntiltheendofsequence.Thus, theembeddingh encodesthei-thinputsandthehistoricalcontexth . i i−1 Basedonthehistoryembeddingh ,wecanderivetheprobabilityofthe(i+1)-th i eventinanapproximativeway, p(ti+1,ei+1|Hti+1)≈p(ti+1,ei+1|hi)=r(ei+1|hi)s(ti+1|ei+1,hi). (3) Firstly we formalize the conditional transition probability r(ei+1|hi). The condi- tionaltransitionprobabilitycanbederivedbya softmaxfunctionwhichiscommonly usedinneuralnetworkforparameterizingcategoricaldistribution,thatis, exp Wαhh r(ei+1|hi)= K e(cid:0)xpkWαih(cid:1)h , (4) j=1 j i P (cid:0) (cid:1) whererowvectorWαh isk-throwofweightmatrixindexedbythemarke . k i+1 MarkedTemporalDynamicsModelingbasedonRecurrentNeuralNetwork 5 Thenwe considertheprobabilitydistributionfunctions(ti+1|ei+1,hi). Theprob- ability distribution function describesthe observation that nothingbut mark e oc- i+1 curreduntiltimet sincethelastevent.WedefinearandomvariableT aboutoccur- i+1 e ingtimeofnexteventwithrespecttomarke,andtheprobabilitydistributionfunction s(ti+1|ei+1,hi)canbeformalizedas s(ti+1|ei+1,hi)=P(Tei+1 =ti+1|ei+1,hi) P(Te >ti+1|ei+1,hi), (5) e∈EY\ei+1 where the probability P(Te > ti+1|ei+1,hi) depicts that the occuring time of event withmarkeisoutoftherange[0,ti+1],andP(Tei+1 =ti+1|ei+1,hi)istheconditional probabilitydensityfunctionrepresentingthefactthatmarke isocurringuntiltime i+1 t . i+1 ToformalizetheEq.(5),wedefinemark-specificconditionalintensityfunctionas fe(ti+1|ei+1,hi) λe(ti+1)= , (6) 1−Fe(ti+1|ei+1,hi) where Fe(ti+1|ei+1,hi) is the cumulative distribution function of fe(ti+1|ei+1,hi). AccordingtoEq.(6),wecanderivethecumulativedistributionfunction ti+1 Fe(ti+1|ei+1,hi)=1−exp(− λe(τ)dτ). (7) Z ti TheprobabilityofP(Te >ti+1|ei+1,hi)=1−Fe(ti+1|ei+1,hi).Thenwecanderive themark-specificconditionalprobabilitydensityfunctionbyEq.(7)as ti+1 P(Te=ti+1|ei+1,hi)=fe(ti+1|ei+1,hi)=λe(ti+1)exp(− λe(t)dt). (8) Zti SubstitutingEq.(7)andEq.(8)intothelikelihoodofEq.(5),wecanget ti+1 s(ti+1|ei+1,hi)=λei+1(ti+1)exp(−Z λ(t)dt), (9) ti whereλ(τ)= λ (τ)isthesummationofallconditionalintensityfunction. e∈E e The key toPspecify probabilitydistribution functions(ti+1|ei+1,hi) is parameter- ization of mark-specificconditionalintensity functionλ . We parameterizeλ condi- e e tionedonh asfollows, i λ (t)=ν ·τ(t;t )=exp Wνhh τ(t;t ), (10) e e i k i i (cid:0) (cid:1) whererowvectorWνhdenotestothek-throwofweightmatrixcorrespondingtomark k e.InEq.(10),themark-specificconditionalintensityfunctionissplitedintotwoparts: ν = exp(Wνhh ) is a nonnegative scalar as the constant part with respect to time e j′ i t, and τ(t;t ) ≥ 0 refersto anarbitrarytime shapingfunction[9]. Forsimplicity,we i considertwowell-knownparametricmodelsfortimeshapingfunction:exponentialand constant,i.e.,exp(wt)andc. 6 YongqingWang,ShenghuaLiu,HuaweiShen,XueqiCheng Atlast,givenacollectionofeventsequencesC ={S }N ,wesupposethateach m m=1 event sequence S is independent of others. As a result, the logarithmic likelihood m of a set of event sequences is the sum of the logarithmic likelihood of the individual sequence. Given the source of event sequence, the negative logarithmic likelihood of thesetofeventsequencesC canbeestimatedas, N |Sm|−1 K L(C)=− Wkαhhi−log exp Wjαhhi mX=1 Xi=1 " Xj=1 (cid:16) (cid:17) ti+1 +Wkνhhi+logτ(t;ti)− exp Wjν′hhi τ(t;ti)dt . Xe∈E (cid:16) (cid:17)Zti # In addition, we want to induce sparse structure in vector ν in order that not all event types are available to be activated based on h . For this purpose, we introduce lasso i regularizationon ν, i.e., kνk1 [25]. Overall, we can learn parametersof RNN-TD by minimizingthenegativelogarthmiclikelihood argminL(C)+γkνk1, (11) W whereγ isthetrade-offparameter. As last, we can estimate the nextmost likely eventsin two steps by RNN-TD: 1) estimatethetimeofeachmarkbyexpectationti+1 = t∞i t·s(t|ei+1,hi)dt;2)calculate thelikelihoodofeventsaccordingtothemark-specifiRcexpectationtime,andthenrank eventsindescendingorderoflikelihood. 3 Optimization Inthissection,weintroducethelearningprocessofRNN-TD.Weapplyback-propagation throughtime(BPTT)[4]forparameterestimation.WithBPTTmethod,weneedtoun- foldtheneuralnetworkinconsiderationofsequencesize|S |andupdatetheparam- m eters once after the completed forward process in sequence. We employ Adam [13], anefficientstochasticoptimizationalgorithm,withmini-batchtechniquestoiteratively updateallparameters.Wealsoapplyearlystoppingmethod[21]topreventoverfitting in RNN-TD. The stopping criterion is achieved when the performance has no more improvementin validation set. The mapping function of φ(t) is defined by temporal featuresassociatedwitht,e.g.,logarithmtimeintervallog(ti−ti−1)anddiscretization ofnumericalattributesonyear,month,day,week,hour,mininute,andsecond.Besides, we employ orthogonal initialization method [11] for RNN-TD in order to speed up convergenceintrainingprocess.Theembeddinglearnedbyword2vec[18,19]is used toinitializetheparameterofmappingfunctionϕ(e). Thegoodinitializationprovided bytheembeddingcanspeedupconvergenceforRNN[17]. 4 Experiments Firstly, we introduce baselines, evaluation metrics and datasets of our experiments. Then we conduct experiments on real data to validate the performance of RNN-TD incomparisonwithbaselines. MarkedTemporalDynamicsModelingbasedonRecurrentNeuralNetwork 7 4.1 Baselines Bothmarkpredictionandtimepredictionareevaluated,andthefollowingmodelsare chosenforcomparisonsinthetwopredictiontasks. (1)Marksequencemodeling. – MC:Themarkovchainmodelisaclassicsequencemodelingmethod.Wecompare withmarkovchainofvaryingordersfromonetothree,denotedasMC1,MC2and MC3. – RNN:RNNisastate-of-the-artmethodfordiscretetimesequencemodeling,suc- cessfully applied in language model. To fairly justify the performance between RNN andourproposedmethod,We use thesame inputsin bothRNN andRNN- TD. (2) Temporal dynamics modeling. We choose point processes and mark-specific pointprocesseswithdifferentcharacterizationsasbaselines. – PP-poisson:Theintensityfunctionrelatedtomarkisparameterizedbyaconstant, depictingtheleavingratefromlastevent. – PP-hawkes:Theintensityfunctionrelatedtomarkeisparameterziedby t−t i λ(t;e)=λ(0;e)+α exp − , (12) (cid:18) σ (cid:19) tXi<t whereσ =1andλ(0;e)isaintrinsicratedefinedonmarkewhent=0. – MSPP-poisson: We define the mark-specific intensity function by a parametric matrix,depictingtheratefromonemarktoanother. – MSPP-hawkes:Themark-specificintensityfunctionisparameterizedbyEq.(12) wheretheconstantrateisspecializedaccordingtomarkpairsinparametricmatrix. We also compare with the model that has the ability to generate both mark and temporalsequences. – RMTPP: Recurrent marked temporal point process (RMTPP) [6] is a method whichindependentlymodelsbothmarkandtimeinformationbasedonRNN. 4.2 EvaluationMetrics Serveralevaluationmetricsareusedwhenmeasuringtheperformanceinmarkpredic- tionandtimepredictiontasks.Weregardthemarkpredictiontaskasarankingproblem with respectto transition probability.The predictionperformanceis evaluatedby Ac- curacyontopk(Acc@k)andMeanReciprocalRank(MRR)[26].Ontimeprediction task,wedefinetoleranceθoverthepredictionerrorbetweenestimatedtimeandpracti- caloccuringtime.Thepredictionaccuracyontimepredictionwithrespecttotolerance θisformulatedas, Acc@θ = Nm=1 i|S=m1|−1δ(|E(t;ei+1,hi)−ti+1|<θ), P P N (|S |−1) m=1 m P whereδ isan indicatorfunction.LargerscoresinAcc@k, MRR andAcc@θ indicate betterpredictions. 8 YongqingWang,ShenghuaLiu,HuaweiShen,XueqiCheng Table1.Performanceofmarkpredictionontwodatasets MRR Acc@1Acc@3Acc@5Acc@10Acc@20 MC1 0.4634 0.2948 0.4595 0.6659 0.8253 0.9209 MC2 0.4788 0.3155 0.4706 0.6773 0.8301 0.9186 MC3 0.4670 0.3149 0.4583 0.6550 0.7891 0.8619 RNN 0.4780 0.3202 0.4746 0.6825 0.8315 0.9201 Memetracker RMTPP 0.4833 0.3241 0.4834 0.6926 0.8386 0.9267 RNN-TD(c) 0.4820 0.3220 0.4790 0.6895 0.8393 0.9270 RNN-TD(exp) 0.4849 0.3266 0.4835 0.6929 0.8400 0.9273 RNN-TD*(c) 0.4820 0.3220 0.4790 0.6895 0.8393 0.9270 RNN-TD*(exp)0.4851 0.3266 0.4844 0.6937 0.8407 0.9274 MC1 0.6174 0.5231 0.6157 0.7212 0.7963 0.8787 MC2 0.6260 0.5280 0.6396 0.7393 0.8007 0.8513 MC3 0.5208 0.4462 0.5395 0.6035 0.6332 0.6569 RNN 0.6355 0.5123 0.6135 0.7153 0.7905 0.8656 Dianping RMTPP 0.6620 0.5482 0.6554 0.7578 0.8271 0.8935 RNN-TD(c) 0.6663 0.5524 0.6601 0.7628 0.8346 0.8999 RNN-TD(exp) 0.6635 0.5448 0.6560 0.7638 0.8345 0.8988 RNN-TD*(c) 0.6663 0.5524 0.6602 0.7628 0.8346 0.8999 RNN-TD*(exp)0.6635 0.5452 0.6566 0.7641 0.8351 0.8990 p.s.theexperimentalresultsfrom*aredependentwithgiventime. 4.3 Datasets We conductexperimentsontwo realdatasetsfromtwo differentscenariostoevaluate theperformanceofdifferentmethods: – Memetracker[14]:Memetrackercorpuscontainsarticlesfrommainstreammedia and blogs from August 1 to October 31, 2008 with about 1 million documents perday.Contentsinthecorpusareorganizedaccordingtotopicsbytheproposed methodin[14].Weusetop165frequenttopicsandorganizethepostingsequence about posted blogs and post-time by users. The whole posting sequence of each user is splited into partsas follows,1) get the statistics of time intervalsbetween twoconsecutivepostedblogs,2)empiricallyestimatetheperiodofuser’sposting behavior,3)anddividethewholesequenceintoseveralpartsaccordingtothees- timatedperiod.Toavoidprocessingshortsequences,wealsoignorethesequences whose length are less than 3. The processed dataset contains 1,481,491 posting sequences, and the time interval between two consecutive blogs is ranged from 2.77×10−4to99.68hours. – Dianping:DianpingprovideanonlinerestaurantratingserviceinChina,including coupon sales, bill payment, and reservation. We extract transaction coupon sales from top 256 popular stores located in Xidan bussiness district of Beijing from year2011to2015.Theconsumptionsequencesofusersaredividedintosegments asthesamestepsdoneinmemetracker.Becauseoftheexistenceofsparseshopping recordsinusers,wealsolimitthattimeintervalbetweentwoconsecutiveconsump- tionsistwomonths.Theprocesseddatasetcontains221,893eventsequences,and MarkedTemporalDynamicsModelingbasedonRecurrentNeuralNetwork 9 RNN-TD(c) RMTPP MSPP-hawkes PP-hawkes RNN-TD(c) RMTPP MSPP-hawkes PP-hawkes RNN-TD(exp) MSPP-poisson PP-poisson RNN-TD(exp) MSPP-poisson PP-poisson 0.4 0.6 0.3 θ@ θ@0.4 Acc0.2 Acc 0.2 0.1 0.00 25 50 75 0.00 5 10 15 Estimation tolerance (hour) Estimation tolerance (min) (a) ExperimentsonMemetracker (b) ExperimentsonDianping Fig.2.Performanceoftimingpredictionontwodatasets. thetimeintervalbetweentwoconsecutiveconsumptionsisrangedfrom2.77×10−4 to1440hours. On bothdatasets, we randomlypickup 80%of sequencedata astrainingset, and therestdataaredividedintotwopartsequallyasvalidationsetandtestsetrespectively. 4.4 PerformanceofMarkPrediction TheperformanceofmarkpredictionisevaluatedusingmetricsAcc@kandMRR.The experimentalresultsareshowninTable1.ComparingwithMC1,MC2,MC3andRNN, RNN-TD(c) and RNN-TD(exp) achieve significant improvements over all metrics in bothdatasets.InMemetracker,RNN-TD(exp)outperformsRMTPPinMRRatsignif- icance level of 0.1, and achieve a little improvementsthan RMTPP in Acc@1,3,5,10 and20.However,theperformanceofRNN-TD(c)isworsethanRMTPP.InDianping, RNN-TD(c) achieves improvementsthan RMTPP in metrics of MRR and Acc@5 at significance level of 0.1 and metrics of Acc@10 and Acc@20 at significance level of 0.01. Besides, RNN-TD(exp) achieves improvements than RMTPP in metrics of Acc@20 at significance level of 0.1 and metrics of Acc@5 and Acc@10 at signifi- cancelevelof0.01.TheexperimentalresultsindicatethatRNN-TDcanbetterlearnthe markgenerationbyjointlyoptimizingmark-specificconditionalintensityfunctionwith respecttodifferenttimeshapingfunctionappliedintasks. We also conductexperimentsaccordingto eventlikelihood on RNN-TD with the given time, marked as RNN-TD*. The results of RNN-TD*(exp) performs little bet- ter than RNN-TD(exp) over all metrics in both datasets, However, the performance of RNN-TD*(c) is almostthe same as RNN-TD(c).It demonstratesthe robustnessof RNN-TD on markpredictionwhetheror not giventhe occuringtime. Besides, RNN- TDwithexponentialformoftimeshapingfunctionhaslargereffectsongiventimethan theconstantform. 10 YongqingWang,ShenghuaLiu,HuaweiShen,XueqiCheng Table2.Casestudyoneventprediction (a) onespecificeventsequencepredictiononmemetraker i-thevent:mark,time(mins) 1th 2nd 3rd c#1 Europedebt,22.32 Europedebt,12.29 Europedebt,60.31 RMTPP c#2 LinkedInIPO,22.32DominiqueStrauss,12.29 AmyWinehouse,60.31 c#3 AmyWinehouse,22.32 LinkedInIPO,12.29DominiqueStrauss,60.31 c#1 Europedebt,1.07 DominiqueStrauss,1.34 DominiqueStrauss,3.63 RNN-TD c#2 DominiqueStrauss,0.45 Europedebt,1.29 Europedebt,3.12 c#3 LinkedInIPO,0.44 LinkedInIPO,0.56 attack,2.93 GroundTruth DominiqueStrauss,6.37 attack,83.78 attack,18.18 (b) onespecificeventsequencepredictionondianping i-thevent:mark,time(days) 1th 2nd 3rd c#1 bibimbap,2.34 bibimbap,2.90Sichuancuisine,3.02 RMTPP c#2 tearestaurnt,2.34 cookies,2.90 cookies,3.02 c#3 Yunnancuisine,2.34 Sushi,2.90 tearestaurnt,3.02 c#1 bibimbap,2.93 barbecue,0.65 barbecue,0.96 RNN-TD c#2 Yunnancuisine,0.88 bibimbap,0.85Sichuancuisine,0.81 c#3 bread,0.92Vietnamesecuisine,0.51 bread,0.48 GroundTruth barbecue,0.14 Sichuancuisine,1.03 barbecue,1.06 4.5 PerformanceofTimePrediction We evaluatethe performanceof timepredictionbyAcc@θ. ThepredictionsofRNN- TDandMSPParebasedontruemarks.Fig.2(a)andFig.2(b)showtheexperimental results of RNN-TD and baselines on memetrackerand dianping.As shown in Fig. 2, without considering any mark information, PP-poisson and PP-hawkes are unable to handlethetemporaldynamicswellonbothMemetrackerandDianping.MPPcandis- criminate mark-specific time-cost, leading to better performance than PPs. In meme- trackerdataset,althoughRMTPPhasbetterperformancethanPP,itdoesnotoverbeat MSPP-poisson and MSPP-hawkes. In dianping dataset, RMTPP(c) and RMTPP(exp) achievebetterperformancethanMSPP-hawkeswhentoleranceθ ≤65hours,andalso achievebetterperformancethanMPP-poissonwhentoleranceθ ≤ 15hours.Itisseen thatRNN-TD(c)andRNN-TD(exp)achievethebestperformancethanallthebaselines in the most cases on two datasets. The improvementsachieved by RNN-TD indicate thatourproposedmethodcanwellmodelmarkedtemporaldynamicsbylearningmark- specificintensityfunctions,whileRMTPPsharethesameintensityfunctionforallthe marks. 4.6 CaseStudyonEventPrediction To explore the capability of event prediction of RNN-TD, we randomly choose one specificeventsequencefrommemetrackeranddianpingrespectively,andestimatethe nexteventsinthesequence.InRNN-TD,weselecttop3eventsindescendingorderof eventlikelihood as candidates of next event, called c#1, c#2 and c#3. In RMTPP, we choosethemostprobablemarkandexpectationtimeindependentlyandcombinethem asthecandidatesofnextevent.Table2liststheperformanceofRMTPPandRNN-TD. We can see that the predicted marks on RNN-TD are more accurate and relevant to