ebook img

Label Propagation on K-partite Graphs with Heterophily PDF

0.68 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Label Propagation on K-partite Graphs with Heterophily

Label Propagation on K-partite Graphs with Heterophily Dingxiong Deng Fan Bai Yiqi Tang Dept. ofComputerScience SchoolofComputerScience SchoolofComputerScience Univ. ofSouthernCalifornia FudanUniversity FudanUniversity [email protected] Shuigeng Zhou Cyrus Shahabi Linhong Zhu SchoolofComputerScience Dept. ofComputerScience InformationSciencesInstitute FudanUniversity Univ. ofSouthernCalifornia Univ. ofSouthernCalifornia [email protected] [email protected] [email protected] 7 1 u u 0 ABSTRACT 1 2 user 2 Inthispaper, forthefirsttime, westudylabelpropagation homophily n inheterogeneousgraphsunderheterophilyassumption. Ho- a mophily label propagation (i.e., two connected nodes share t1 t2 t3 tweet J similar labels) in homogeneous graph (with same types of heterophily 1 vertices and relations) has been extensively studied before. 2 Utaninfordtiuffnearetnelty,tryepaels-liofefnveetrwtiocrekss(aer.eg.h,etuesreorgs,eniemoaugse,st,hetyexctosn)- w1 w2 w3 w4 word ] and relations (e.g., friendships, co-tagging) and allow for G each node to propagate both the same and opposite copy Figure 1: An example of sentiment labeling on a L of labels to its neighbors. We propose a K-partite label heterogeneous network with three types of vertices: . propagation model to handle the mystifying combination users, tweets and words. Solid edges denote ob- s c of heterogeneous nodes/relations and heterophily propaga- servedlinks, anddashededgesrepresentlabelprop- [ tion. With this model, we develop a novel label inference agation directions. The width of dash edges dis- algorithm framework with update rules in near-linear time tinguishes strengths of propagation. Green color: 1 complexity. Sincerealnetworkschangeovertime,wedevise positive label; red color: negative label; homophily: v an incremental approach, which supports fast updates for propagate the same copy of labels; and heterophily: 5 bothnewdataandevidence(e.g.,groundtruthlabels)with propagate the opposite copy of labels. 7 guaranteed efficiency. We further provide a utility function 0 toautomaticallydeterminewhetheranincrementalorare- 6 modeling approach is favored. Extensive experiments on ent algorithms [11, 19, 33, 39, 41, 44] have been proposed 0 realdatasetshaveverifiedtheeffectivenessandefficiencyof to perform label propagation on trees or arbitrary graphs. . 1 our approach, and its superiority over the state-of-the-art All of these traditional algorithms simply assume that all 0 label propagation methods. vertices are of the same type and restricted to only a single 7 pairwise similarity matrix (graph). 1 Unfortunately, many real networks such as social net- : 1. INTRODUCTION works are heterogeneous systems [3, 32, 38] that contain v i Label propagation [44] is one of the classic algorithms objects of multiple types and are interlinked via various re- X to learn the label information for each vertex in a network lations. Consideringtheheterogeneityofdata,itistremen- r (or graph). It is a process that each vertex receives labels douslychallengingtoanalyzeandunderstandsuchnetworks a fromneighborsinparallel,thenupdatesitslabelsandfinally throughlabelpropagation. Applyingtraditionallabelprop- sendsnewlabelsbacktoitsneighbors. Recently,labelprop- agation approaches directly to heterogeneous graphs is not agation has received renewed interests from both academia feasible due to the following reasons. First, traditional ap- and industry due to its various applications in many do- proaches neither support label propagation among differ- mains such as in spam detection [1], fraud detection [10], ent types of vertices, nor distinguish propagation strengths sentiment analysis [15], and graph partitioning [35]. Differ- among various types of relations. Consider the example of sentimentlabelingshowninFig.1,thelabelofeachtweetis estimated using the labels of words and users. In addition, the label information of users is much more reliable than that of words in terms of deciding the labels of tweets [42], and thus the user vertices should have stronger propaga- tion strengths than word vertices. Second, traditional ap- proaches do not support heterophily propagation. As illus- trated in Fig. 1, in sentiment label propagation, traditional approaches assume that if a word has a positive label, then its connected tweets also have positive labels. However, it 1 Table 1: A summary for label propagation meth- words documents authors data ods. H-V denotes whether a method supports het- cleaning erogeneous types of vertices, H-P denotes whether a method supports heterophily in label propaga- mining tion, Auto means whether the heterophily matrix is graph items automatically learned or predefined, Incre denotes join whether the proposed label propagation algorithm query supports incremental update upon new data or new (a) Folksonomy (b) Text corpus labels, and Joint denotes whether the proposed ap- proachallowsavertextobeassociatedwithmultiple Figure 2: Examples of ubiquitous tripartite graphs. labels. Method H-V H-P Auto Incre Joint is reasonable that a tweet connected to a positive word is [4] [11] [33] [34] X X X X X negativeduetosarcasm.Therearefewnotableexceptions[9, [39] [41] [44] 13, 20] (see more in related works) that support either het- [7] X X X X (cid:88) erogeneoustypesofverticesorheterophilypropagation,but [9] [20] (cid:88) X X X ? not both. Last but not least, all of the current approaches [13] [37] X (cid:88) X (cid:88) X simplyassumethatthepropagationtype(e.g.,ahomophily Proposed (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) orheterophilypropagation)isgivenasaninput,thoughac- tually this is difficult to obtain from observation. theconfidencelevelparameterforspeedup,butalsocande- In this paper, we study the problem of labeling nodes terminewhetherweshouldapplyourincrementalalgorithm withcategories(e.g.,topics,communities,sentimentclasses) through a utility function. in a partially labeled heterogeneous network. To this end, The contributions of this work are as follows: we first propose a K-partite graph model as the knowledge representation for heterogeneous data. Many real-life data 1. Problem formulation: we show how real-life data examples, naturally form K-partite with different types of analytic tasks can be formulated as the problem of nodes and relations. To provide a few examples, as shown label propagation on K-partite graphs. inFig.2,infolksonomysystem,thetripletrelationsamong users, items and tags can be represented as a tripartite 2. Heterogeneityandheterophily: weproposeageneric graph; document-word-author relation can also be modeled label propagation model that supports both hetero- as another tripartite graph with three types of vertices. A geneity and heterophily propagation. K-partite graph model nicely captures the combination of vertex-levelheterogeneityandheterogeneousrelations,which 3. Fast inference algorithms: wedevelopaunifiedla- motivates us to focus on label propagation on K-partite bel propagation framework on K-partite graphs that graphs. That is, given an observed K-partite and a set of supports both multiplicative and addictive rules with veryfewseedverticesthathaveground-truthlabels,ourgoal near-linear time complexity. We then propose an in- is to learn the label information of the large number of the crementalframework,whichsupportsmuchfasterup- remainingvertices. EventhoughK-partitegraphsareubiq- dates upon new data and labels than re-computing uitous and the problem of label propagation on K-partite from scratch. In order to strike a balance between ef- graphs has significant impact, this area is much less stud- ficiency and effectiveness, we introduce a confidence ied as compared to traditional label propagation in homo- level parameter for speed up. We further develop a geneous graphs and thus various modeling and algorithmic utilityfunctionisdesignedtoautomaticallychoosebe- challenges remain unsolved. tween an incremental or a re-computing approach. To address the modeling challenges, we develop a unified 4. Practical applications: we demonstrate with three K-partite label propagation model with both vertex-level typicalapplicationexamplesthatvariousclassification heterogeneityandpropagation-levelheterogeneity. Consider tasksinrealscenarioscanbesolvedbyourlabelprop- the sentiment example in Fig. 1, our model allows a tweet agation framework with K-partite graphs. vertex to receive labels from both words and users, but au- tomatically gives higher weights to the latter through our 2. RELATEDWORKS propagation matrix. The propagation-level heterogeneity, whichisreflectedbysupportinghomophily,heterophilyand In this section, we first provide an extensive (but not ex- mixedpropagation,areautomaticallylearnedinourmodel. haustive) review about the state-of-the-art label propaga- To infer our model, we first propose a framework that sup- tion approaches, and then discuss related works on hetero- ports both multiplicative [25] and addictive rules [18] (i.e., geneous graph representation using K-partite graphs. projected gradient descent) under the vertex-centric man- 2.1 LabelPropagation ner with near linear-time complexity. Because in practice graph and labels can continuously changing, we then study We summarize a set of representative label propagation how and when we should apply label propagation to han- algorithms in Table 1. In the following, we provide more dlethechangingscenarios. Wethusdeviseafastincrement details about each approach, with emphasis on their ad- algorithm(i.e.,assigninglabelsfornewday,orupdatingla- vantages and disadvantages. First we consider how these belsuponfeedbacks),thatperformspartialupdatesinstead methods support propagating labels in heterogeneous net- of re-running the label inference from scratch. We not only works. Various types of algorithms have been proposed to cancontrolthetrade-offbetweenefficiencyandaccuracyvia performlabelpropagationontreesorarbitrarygraphssuch 2 asbeliefpropagation[11][39],loopybeliefpropagation[19], structures (the local cluster structures and the global com- Gaussian Random Field (GRF) [44], MP [33], MAD [34], munity structures) from a K-partite graph; Zhu et al. [42] and local consistency [41]. All of these algorithms simply addressed both static tripartite graph clustering and online assume that all vertices are of the same type and restricted tripartite graph clustering with matrices co-factorization. to only a single graph. There are other works which study theoretical issues such Recently with the tremendously increasing of heteroge- as competition numbers of tripartite graphs [22]. neous data, label propagation approaches on heterogeneous Tosummarize,K-partitegraphmodelingandanalysishas networks have been developed. Jacob et al. [20] focused beenstudiedfromdifferentperspectivesduetoitspotential on learning the unified latent space representation through in various important applications. Yet studies on learning supervised learning for heterogeneous networks. Ding et with K-partite graph modeling are limited. In this work, al.[9]proposedacrosspropagationalgorithmforK-partite we formulate a set of traditional classification tasks such graphs, which distinguishes vertices of different types, and as sentiment classification, topic categorization, and rating propagates label information from vertices of one type to prediction as the label propagation problem on K-partite vertices of another type. However, they still make the ho- graphs. With the observed K-partite graphs, our label in- mophily propagation assumption, that is, two connected ferenceapproachisabletoobtaindecentaccuracywithvery nodeshavesimilarlabels(representations),eventhoughthey few labels. are from different types. On another hand, Gatterbauer et al. [13] proposed a heterophily belief propagation algo- 3. PROBLEMFORMULATION rithm for homogeneous graphs with the same type of ver- Lett,t(cid:48) beaspecificvertextype. AK-partitegraphG= tices(e.g.,ausergraph). Yamaguchietal.[37]alsoproposed a heterophily propagation algorithm that connects to ran- < ∪Kt=1Vt, ∪Kt=1 ∪Kt(cid:48)=1 Ett(cid:48) > has K types of vertices, and contains at most K(K−1)/2 two-way relations. We also domwalkandGaussianRandomField. Inbothapproaches, use notation G to denote the adjacency matrix representa- they assumed that even a user node is labeled as fraud, it cannotsimplypropagatethefraudlabeltoallitsneighbors. tion of a K-partite graph and use Gtt(cid:48) to denote the sub graph/matrix induced by the set of t-type vertices V and However, their approach does not support vertex-level het- t erogeneity and use the same type of propagation over the the set of t(cid:48)-type vertices Vt(cid:48). For ease of presentation, Ta- ble 2 lists the notations we use throughout this paper. entire network. In addition, the propagation matrices (e.g., Intuitively,thepurposeoflabelpropagationonK-partite diagonal matrix for homophily, off-diagonal matrix for het- graph is to utilize observed labels of seed vertices (denoted erophily) are required to be predefined based on observa- as VL) to infer label information for remaining unlabeled tion,ratherthantobeautomaticallylearned. Inthiswork, vertices (denoted as VU). For each vertex v, we use a col- we propose a unified label inference framework, which sup- umn vector Y(v) ∈ Rk×1 to denote the probabilistic label portsbothvertex-levelheterogeneity,andpropagation-level assignment to vertex v and a matrix Y ∈ Rn×k to denote heterophily (i.e., different types of propagation across het- theprobabilisticlabelassignmentforallthevertices,where erogeneous relations). Furthermore, our framework is able n is number of nodes and k is number of labels. Therefore, to automatically learn the propagation matrices from the ourobjectiveistoinfertheY matrixsothateachunlabeled observed networks. vertex v∈VU obtains an accurate estimation of its ground Currentresearcheshavebeenfocusedonaddressingalgo- truth label. rithmic issues in the problem of label propagation such as We first build intuition about our model using a running incrementally and jointly learning for label inference. Gat- example shown in Fig. 3(a). Suppose we only observe the terbauer et al. [13] proposed a heuristic incremental belief links and very few labels of vertices as shown in Fig. 3(b), propagation algorithm with new data. However, more re- thepurposeoflabelpropagationontripartitegraphsisthen search is required in finding provable performance guaran- to utilize observed links to infer label information. We in- tee for the incremental algorithm. Chakrabarti et al. [7] troduce a matrix B ∈ Rk×k to represent the correlation proposed a framework with joint inference of label types between link formation and associated labels. Each entry such as hometown, current city, and employers, for users B(l ,l ) denotes the likelihood that a node labeled as l is connected in a social network. To advance existing work, i j i connectedwithanothernodelabeledasl . Subsequently,in weproposeanincrementalframeworkwhichsupportsadap- j the label propagation process, each entry B(l ,l ) also de- tive update upon both new data and labels. In addition, i j notesaproportionalpropagationthatindicatestherelative our algorithm is guaranteed to achieve speedup compared influence of nodes with label l on another nodes with label to re-computing algorithms with a certain confidence. It i l . To keep consistent with the label propagation process, also supports multi-class label joint inference and provides j here we interpret B as the propagation matrix. a better understanding about the latent factors that cause Note that the B matrix is an arbitrary non-negative ma- link formations in the observed heterogeneous networks. trix without any restrictions. If the B matrix is diagonal (e.g., B from V to V in Fig. 3(d)), the link formation is 2.2 K-partiteGraphs consisteanct with haomophcily assumption; while if the B ma- K-partite graph analysis has wide applications in many trix is off-diagonal, the link formation is more likely due to domainssuchastopicmodeling[27],communitydetection[29], heterophily assumption. In addition, as shown in Fig. 3(d), andsentimentanalysis[42]. Mostoftheseworksusetripar- thepropagationmatrixbetweena−andb−typeverticesis tite graph modeling as a unified knowledge representation different from that between a− and c− type vertices. We forheterogeneousdata,andthenformulatetherealdataan- thus let Btt(cid:48) denote the propagation matrix between t-type alytictasksasthecorrespondingtripartitegraphclustering vertices and t(cid:48)-type vertices. problem. For example, Long et al. [27] proposed a general Generallyspeaking,ifwehavepriorknowledgeaboutprop- model, the relation summary network, to find the hidden agation matrix B, then each unlabeled vertex could receive 3 Va 1 2 Va 1 2 Table 2: Notations and explanations. Vb 3 4 5 6 Vb 3 4 5 6 Notations Explanations. n/m/k/K numberofnodes/edges/labels/types Vc 7 8 Vc 7 8 G/B/Y Graph/Propagation/Labelassignmentmatrix t(v)/d(v)/N(v) thetype/degree/neighborsofvertexv (a) Ground truth (b) Observation Y(v)∈Rk×1 thelabelassignmentforvertexv asubmatrixofmatrixM betweent-type Mtt(cid:48) verticesandt’-typevertices 3 7 Va to Vb Va to Vc Vc to Vb η stepsizeinadditiverule g r g r g r (cid:15) averysmallpositiveconstant green 1 2 4 6 58 gr 10..55 02..55 gr 10 01 gr 11..55 12.5 θ1LA(x) piLnaidpriasccmahteiottzrerfcuoinnncsittniaocnrntementalalgorithm red J(x) Objectivefunctionofx (c) Label embedding (d) Example propagation matrics J(x) Largrangianfunctionofx Figure 3: An illustration of our label inference process challengesremainingtobesolved. First,theNP-hardnessof with a tripartite graph example. Here we use different Problem 1 (the sub problem of nonnegative matrix factor- shapes/colors to distinguish types/labels. g: green, and izationisNP-hard[36])requiresefficientsolutionsforlarge- r: red. Inaddition,thebluecolordenotesanoverlapping scale real problems. Second, the rapid growth of data and assignment of both green and red. feedbacksrequiresfastincrementalupdate. Inthefollowing, weaddressthosecomputationalchallengesbydevelopingef- ficient algorithms that are highly scalable and achieve high proportional propagation from very few labeled vertices ac- qualityintermsofclassificationaccuracyforreal-lifetasks. cording to B. We propose to infer the propagation matrix andlabelassignmentviaembedding. Weembedeachvertex 4. LABELINFERENCEALGORITHMS intoaunifiedspace(foralltypesofvertices),whereeachdi- In this section, we discuss the algorithmic issues in la- mension of the latent space denotes a specific label and the bel inference process. Specifically, our label inference prob- linkprobabilityoftwoverticesarecorrelatedtotheirlabels lem leads to an optimization problem, which searches for (i.e., distance in the latent space). Fig. 3(c) shows a sim- the optimum weighted propagation matrix B and label as- plifiedtwo-dimensionspace. Withthelabelembedding,the signment matrix Y to minimize the objective function in label assignment and the correlation can be automatically Eq. (1). Considering these nonnegative constraints, differ- learned. entapproaches[18,26,21]havebeenproposedtosolvethis Withtheseintuitions,wefocusonthefollowingproblem: non-convexoptimizationproblem. Amongthem,themulti- Problem 1 (K-partite Graph Label Inference) Given plicative update [25] and additive update rules [26] are two a K-partite graph, a set of labels L, a set of seed labeled mostpopularapproachesbecauseoftheireffectiveness. The vertices VL with ground truth labels Y∗ ∈ RnL×k (nL = multiplicativeupdateapproachbatchupdateseachentryof |VL|),thegoalistoinferthelabelassignmentmatrixY and thematrixbymultiplyingapositivecoefficientateachiter- the propagation matrix B such that ation, while the additive update approach is a project gra- argYm,Bi≥n0{(cid:88)(cid:88)(cid:107)Gtt(cid:48) −YtBtt(cid:48)YtT(cid:48)(cid:107)2F disiennotedxeisscteinngt mwoetrhkotdh.atTocotmhebibneesstdoifffeoruerntknuopwdlaetdegeru,ltehserine t t(cid:48)(cid:54)=t (1) a unified framework. +β (cid:88) (cid:107)Y(u)−Y∗(u)(cid:107)2F +λregularizer(G,Y)} In this paper, we propose to combine these two update u∈VL rules in a unified framework. We observe that the typical multiplicative update rule that batch updates each entry where t denotes the type of vertex, Y denote a sub matrix t of the matrix, can be transferred to a vertex-centric rule of Y, which gives the label assignment for the set of t-type that corresponds to update each row of the matrix per it- vertices V , β and λ are parameters that control the contri- t eration. This transformation allows us to unify both rules bution of different terms, and regularizer(G,Y) denotes under the same vertex-centric label propagation framework a regularization approach such as graph regularization [5], because many addictive rules are updating each row per it- sparsity [14], diversity [40], and complexity regularization. erations [26, 21]. We notice that multiplicative and addic- In our objective function, the first term evaluates how tiveupdaterulessharesomecommoncomputations. Conse- well each B matrix represents the correlation between link quently by pre-computing these common terms, our frame- formation G and associated labels Y, via computing the work can be independent to various different update rules, error between estimated link formation probability and the while remains as efficient as possible. The proposed frame- observedgraph. Thesecondterm(cid:80) (cid:107)Y(u)−Y∗(u)(cid:107)2 gives work enjoys three important by-products: (1) supporting u F penalty to seed vertices if their learned labels are far away faircomparisonbetweendifferentupdaterules,(2)oneuni- from the ground truths. Note that the regularization term fied incremental algorithm in Section 5 naturally support providesanadd-onpropertythatutilizesadditionaldomain- various update rules, (3) easy to parallel because the up- specific knowledge to improve learning accuracy. dates of each vertex can be performed at the same time. In conclusion, our problem definition has well addressed Algorithm1 presentsthe frameworkof ourvertex-centric all the mentioned modeling challenges. Unfortunately, be- labelinferencealgorithm,whichconsistsofthreesteps: ini- sides modeling challenges, there are several computational tialization, update for the propagation matrix B, and up- 4 Algorithm 1 The unified label inference framework We first compute the sim(u,v,G) with Eq. (3), and then Input: Graph G, a few ground truths Y∗ normalize all the scores into the range [0,1]. Here d(u) is Output: Label Matrix Y the degree of vertex u, and N(u) is the set of neighbors of 01: Initialize Y and B (see Section 4.1) vertex u. 02: repeat Initializing BForeachBtt(cid:48),weinitializeitbasedonlabel 03: update Btt(cid:48) (see Section 4.3) class information and vertex type information of the ob- 04: update common terms At (see Eq. (6)) served labeled vertices. Specifically, we first initialize each /∗ Vertex-centric search (block coordinate search)∗/ Btt(cid:48) as an identity matrix, where we assume that if type-t 05: for each vertex u∈V vertices are in l class, then all their connected type-t(cid:48) ver- i 06: update and/or normalize Y(u) (see Section 4.2) ticeswillreceivecorrespondingl classlabel. Wethenincre- i 07: Ys =Y; mentBtt(cid:48)(li,lj)byonewheneverweobserveatype-tvertex 08: Y =argminY ||Ys−Y||2F +λregularizer(G,Y) labeled li is connected to another type-t(cid:48) vertex labeled lj. 09: until converges Finally, we normalize each Btt(cid:48) using L1 norm. 10: return Y 4.2 UpdateY As introduced earlier, we perform vertex-centric update date for the label assignment of each vertex Y(u). We first for Y. That is, in each iteration, we focus on minimizing provide a non-negative initialization for both Y and B in the following sub objective: Section4.1. WenextiterativelyupdateB (Section4.3)and Y (Section 4.2) until the solution converges (Lines 2–10) J(Y(u))= (cid:88) (G(u,v)−Y(u)TBt(u)t(v)Y(v))2 with both multiplicative and addictive rules. Because the v∈N(u) computational cost is dominated by updating the label as- + (cid:88) (Y(u)TB Y(v))2 (4) t(u)t(v) signments for all the vertices (Lines 5–6). To reduce the v(cid:54)∈N(u),t(v)(cid:54)=t(u) computational cost, when updating label assignment ma- +1 (u)β(cid:107)Y(u)−Y∗(u)(cid:107)2 trixY inSection4.2,wedesignanefficientcachetechnique VL F that pre-computes and reuses common terms shared by the where1A(x)isthe indicator function, whichisoneifx∈A same type vertices (i.e., A for each t-type vertex) for both and zero otherwise, N(u) is the neighbors of vertex u. t updaterules. Weshowthatbypre-computingA ,thecom- Inthefollowing,weadopttworepresentativeupdaterules, t putationaltimeforeachY decreases,consequentlythecom- multiplicative rule and additive rule, to derive the optimal t putational cost per iteration is much reduced. solution for Eq. (17). Inthefollowingwefirstpresentthedetailsofthreecompo- Lemma 1 (MultiplicativeruleforY)Y canbeapprox- nents in our unified algorithm framework from Section 4.1– imated via the following multiplicative update rule: Section4.3,wethenshowtheequivalencebetweenelement- wise multiplicative and vertex-centric multiplicative, and Y(u)=Y(u)◦ analyze the theoretical properties of different update rules (cid:115)(cid:80) G(u,v)B Y(v)+β1 (u)Y∗(u) (5) in Section 4.4. v∈N(u) t(u)t(v) VL A Y(u)+β1 (u)Y(u)+(cid:15) 4.1 Initialization t(u) VL where (cid:15) > 0 is a very small positive value (e.g., 1−9), and Toachieveabetterlocaloptimum,alabelinferencealgo- A is defined as follows: rithm should start from one or more relatively good initial t ginuietisasleisz.atIionntfhoirslwaboerkl,aswsiegnfomcuenstomnagtrraixphY;pwrohxiilmeiBtymbaatsreidx At= (cid:88) Btt(v)Y(v)Y(v)TBtTt(v) (6) v(cid:54)∈Vt is initialized using observed label propagation information among labeled seed vertices. Proof: Theproofcanbederivedinspiritoftheclassicmul- tiplicativealgorithm[25]forNon-negativematrixfactoriza- Initializing Y Given the set of labeled vertices VL with tion with the KKT condition [23]. Details are presented in ground truth label Y∗ ∈ Rnl×k, we utilize the graph prox- Appendix 7.1. imitytoinitializetheunlabeledverticeswithsimilarlabeled and the same-type vertices. Specifically, the label assign- Lemma 2 (Additive rule for Y) An alternative approx- ment matrix Y0 can be initialized as follows: imatesolutiontoY canbederivedviathefollowingadditive Y∗(u) ifu∈VL rule: Y0(u)= avg sim(u,v,G)Y(v) otherwise (2) Y(u)r+1 =max((cid:15),Y(u)r+2η( (cid:88) G(u,v)Bt(u)t(v)Y(v) v∈Vt(u) v∈N(u) wheresim(u,v)evaluatesthegraphproximitybetweenver- −AtY(u)+β1VL(u)(Y∗(u)−Y(u)r))) tices u and v. In our experiments, we define sim(u,v,G) (7) as the normalized Admic-Adar score [2]. In order to evalu- where η is the step size, and At is defined in Eq. (6). atethesimilaritybetweentwovertices,wesumthenumber Proof: It can be easily derived by replacing the deviation of neighbors the two vertices have in common. Neighbors of J(Y(u)) into the standard gradient descent update rule. that are unique to a few vertices are weighted more than commonly occurring neighbors. That is, Stepsizeforadditiverule. WeuseNesterov’smethod[17],[28] and [43] to compute the step size η, which can be esti- (cid:88) 1 sim(u,v,G)= (3) mated using the Lipschitz constant L for ∇J(Y(u)), see logd(w) Appendix 7.2. w∈(N(u)∩N(v)) 5 2 and 3 [21], any limited point of the sequence generated Table 3: Time complexity of basic operators, where by Algorithm 1 reaches the stationary point if the update n is number of nodes, m is number of edges, and k rules remain non-zero and achieve optimum. Therefore, in is number of label classes. general, if any update rule in Algorithm 1 is optimum for B Y A Multi O(n+m)k O(n+m)k (cid:80)t(cid:48)(cid:54)=ttnt(cid:48)k2 boWthesunbexotbajnecatliyvzees,tihtelecaodnsvetrogtehnecestpartoiopneartriyespowinhte.n using Addti O(n+m)k O(n+m)k (cid:80)t(cid:48)(cid:54)=tnt(cid:48)k2 both multiplicative update rules and additive update rules. Although both the multiplicative updating rules and addi- tive rules used in this work are not optimum for subprob- 4.3 UpdateB lems, they still have very nice convergence properties. As In the following, we present the detailed update rules for proved on Theorem 2.4 [6], the proposed additive rules still propagation B. convergeintoastationarypoint. Forthemultiplicativerule, we conclude that using the proposed multiplicative updat- Lemma 3 (Multiplicative rule for B) B can be derived ing rules guarantees that the value of objective function is via the following update rule: non-increasing, and thus the algorithm converges into a lo- (cid:115) caloptima. Thisisbecausetheproposedmultiplicativerules Btt(cid:48) =Btt(cid:48) ◦ YtTYYtBtTtGt(cid:48)Yttt(cid:48)T(cid:48)YYt(cid:48)t(cid:48) +(cid:15) (8) ainreLiedmenmtiaca6l,taontdheZthruadeittiaoln.a[4l2m]uhlatvipelipcraotviveedrtuhleastatsheprvoavleude of objective function is non-increasing with the traditional Proof: Proof of this Lemma is similar to Lemma 1, as multiplicative rules. shown in Appendix 7.3. 5. INCREMENTALLABELINFERENCE Lemma 4 (Additive rule for B) An alternative approx- imatesolutiontoB canbederivedviathefollowingadditive In this section, we present our solution to the fundamen- rule: talresearchquestionwithpracticalimportance: How can we support fast incremental updates upon graph updates Btt(cid:48) =max((cid:15),Btt(cid:48) +2ηb(YtTGtt(cid:48)Yt(cid:48) −YtTYtBtt(cid:48)YtT(cid:48)Yt(cid:48))) such as new labels and/or new node/edges? Thisisbe- (9) cause in practice graphs and labels are continuously chang- where η again denotes the step size. b ing. In the following, we first develop an incremental algo- rithm that adaptively updates label assignment upon new Proof: It can be easily derived by replacing the deviation data, where we can control the trade-off between efficiency of J(Btt(cid:48)) into the standard gradient descent update rule. and accuracy. We then further explore another interesting Similar to the computation of η with Nesterov’s gradient question: on which condition it is faster to perform method, η can be computed with Lipschitz constant L = 2(cid:107)YtT(cid:48)Yt(cid:48)YtTbYt(cid:107)F for ∇J(Btt(cid:48)), see Appendix 7.4. b iadncdrreesmsetnhtiaslisusuped,atwee pthroapnosreecaoumtpiulittiynfgunfcrtoiomnstchraattecxha?mT-o 4.4 ComparisonandAnalysis ines the reward and cost of both update operations. With the utility function, our framework is able to automatically We first show that the solution Y returned by the pro- determine the “best” strategy based on different levels of posed multiplicative rule is identical to that by the tradi- changes. tional multiplicative rule proved in the following lemma. 5.1 IncrementalUpdateAlgorithm Lemma 5 UpdatinglabelassignmentY vertexbyvertexus- ing Lemma 1 is identical to the following traditional multi- Our incremental update algorithm supports both graph plicative rule [42]: changes and label changes. The first scenario includes ver- tex/edge insertion and deletion; while the label changes in- Y =Y ◦(cid:115) (cid:80)t(cid:48)(cid:54)=tGtt(cid:48)YtBtTt(cid:48) +βStY0 clude receiving additional labels, or correction of noise la- t t (cid:80)t(cid:48)(cid:54)=tYtBtt(cid:48)YtT(cid:48)Yt(cid:48)BtTt(cid:48) +βStYt bgeetlsa.dWdiittihontahleexpprolilcifietrlaatbioenlsofrfocmrouwsdesros;uorcrinwge,iwdehnetnifeyvenrowisee where S ∈ Rn×n is the label indicator matrix, of which labels based on user feedback, we can update label assign- S =1 if u∈VL and zero for all the other entries, and S ment for each vertex. uu t is the sub matrix of S for t-type vertices. A simple approach to deal with graph/label updates is tore-runourlabelinferencealgorithmfortheentiregraph. Proof: The detailed proof is shown in Appendix 7.5. However, this can be computationally expensive. We thus We then analyze the time complexity of computing each propose the incremental algorithm, where we perform par- basic operator in a single iteration. As outlined in Table 3, tial updates instead of full updates for each vertex. Our eachbasicoperatorcanbecomputedefficientlyinrealsparse incremental algorithm is built upon the intuition that even networks. In addition, because of our pre-computation of with the graph and label changes, the majority of vertices A , multiplicative and additive update rules have the same tendtokeepthelabelassignmentsorwillnotchangemuch. t near-linear time computational cost in a single iteration, Therefore,weperforma”lazy”adjustment,i.e.,utilizingthe which is much smaller than many traditional multiplicative old label assignment and updating a small candidate set of rulesforY. Forexample,ifweapplythemultiplicativerule change vertices. Unfortunately, it is very challenging to de- proposedbyZhuet. al.[42],itleadstoO(n n k+n n k+ signanefficientandeffectiveincrementallabelpropagation a b a c n n k) computation complexity per iteration. algorithm because in label propagation the adjustment of b c Convergence. Letusfirstexaminethegeneralconvergence existing vertices will affect its neighbors, and even vertices of our label inference framework. Based on Corollary 1, that are far away. This requires us to propose an effective 6 Algorithm 2 Incremental label Inference algorithm with which receive updated ground truth labels. Next, we iter- changes atively perform a conditioned label assignment update for Input: Graph G, old label matrix Y, a few ground truths Y∗, each vertex in the candidate set (Lines 6–7), as well as an confidence level θ∈[0,1), and changes ∆V update for candidate vertices set cand (Lines 8–10). When Output: New label matrix Y updating the candidate vertices, we include one neighbor n 01: cand=∆V of an existing candidate vertex into cand only if it satisfies 02: for each u∈G theconditions(Line9)thatarebasedonthepre-computed 03: Y (u)=Y(u) value of w and δ (Lines 4–5). The w exactly denotes the n 04: wtt(cid:48) =avg(u,v)∈Ett(cid:48)Y(u)TBtt(cid:48)Y(v), astvaenradgaerdd edffeevciattioofnanofyegffivecetnsionfdiavnidyugailveanndinδdidveidnuoatel,satnhde σtt(cid:48) =std(u,v)∈Ett(cid:48)Y(u)TBtt(cid:48)Y(v) θ is a confidence level parameter for the trade-off between 05: repeat efficiency and accuracy. We here make an assumption that 06: for each vertex u∈cand the propagation matrix B is inherent property and can be 07: update Y (u) (see Section 4.2) n accuratelyestimatedbythesampledolddata(i.e.,B isnot 08: for each v∈N(u),v(cid:54)∈cand (cid:113) changing with new data). The details of candidate vertex 09: if |YnT(u)Bt(u)t(v)Yn(v)−wt(u)t(v)|≥ 1−1θσt(u)t(v) update are presented as follows. 10: cand=cand∪{v} Update of candidate vertices set. Thesetofcandidate 11: until Y converges n vertices(cand)denotesasubsetofvertices, wherethelabel 12: return Y n assignmentrequiresupdateduetograph/labelchanges. Ba- sically,foreachcurrentvertexuincand,weupdateitslabel assignment and examine its propagation behavior over its strategy to choose which subset of vertices to be adjusted neighbor vertex v. Intuitively, the modification of one ver- to guarantee the performance gain of the incremental algo- texcancausetherelationtoitsneighborsadjusted,whereas rithm. Let us use the following example to illustrate more the general behavior between the corresponding two types about the challenge. of vertices should not change much. More specifically, let the wtt(cid:48) =avg(u,v)∈Ett(cid:48)Y(u)TBtt(cid:48)Y(v) denote the averaged VVa 3 1 4 5 2 6 190 en 13 7 4 9 10 tdehffeeveciσattttio(cid:48)ofn=anosyftdeiffn(ude,cvivt)∈sidEbuttea(cid:48)tlYwb(eeuet)nwTeBte−tnt(cid:48)aYtn−(dva)tn(cid:48)d−dentt(cid:48)oy−tpeettyhipneed,sitvaaindndudaallresdt. b e 5 gr If the estimated effect between u and v after adjustment, V 7 8 2 6 8 significantly differs from the averaged effect (i.e., w ) c t(u)t(v) red within the same types, we add vertex v into cand. The sig- nificance is evaluated using a threshold that consists of the Figure 4: Two new vertices are inserted into the Tripartite graph in Fig. 3. confidence parameter θ and σtt(cid:48). Example 1 Fig.4showsanexampleofinitializingthecan- Va 1 2 3 7 didate set of vertices to be adjusted when graph receive up- dedagteess.inTtwoothveerttricipeasr9titaendgra1p0harsehoiwnsneritnedFwigi.th3n(eaw),fowrhmicedh Vb 3 4 5 6 reen 1 4 5 g consequently leads to vertices 5, 6, 8 receive new links from V 7 8 2 6 8 c vertices9and10. Therefore,thesubsetofvertices{5,6,8,9,10} red (i.e., the subgraph bounded by the red box) receive graph changesandtheirlabelembeddingrequiretobeupdated. How- Figure 5: Vertex 7 of the Tripartite graph in Fig. 3 ever, updating the position of vertex 6 in the latent space obtain a new label mightcausethechangeofthatofvertex2, orevenallofthe remaining vertices. It is unclear to what extent we should prorogate those changes: Too aggressive leads to an entire Example 2 Consider again the example shown in Fig. 4, update (equivalent to the recomputing) while too conserva- the graph changes activate the changes of embedded posi- tive leads to great loss in accuracy. tions of vertices {5, 6, 8, 9, 10} (i.e., vertices in the box). The movement of changed vertices in the label embedding space, consequently causes their neighbor vertices to leave Overview of the incremental algorithm. We develop old embedded positions. For example, if vertex 6 is moved an incremental algorithm based on the mean field theory, furthertotheredaxis,itsneighborvertex2mightberequired whichassumesthattheeffectofalltheotherindividualson to move away from the green axis too. Similarly, in Fig. 5 anygivenindividualisapproximatedbyasingleaveragedef- whenvertex7receivesanewlabel,vertex7arepushedcloser fect. Hencewechoosethecandidatesetofverticesbasedon to the green axis, which further influences the movement of how much they differ from the averaged effect. The overall both vertex 1 and vertex 4. Meanwhile, the new label also processisoutlinedinAlgorithm2. Wefirstidentifyasmall strengthens the green label propagation to vertex 1. portionofchanges∆V asourintialcandidatesetofvertices cand(Line3),where∆V denotethesetofchangedvertices Confidence level parameter θ for speed up. We now (e.g.,{5,6,8,9,10}inFig.4),includingnewanddeletedver- discusstheeffectofparameterθthatcontrolsthepercentage tices, vertices that have new or deleted edges, and vertices of neighbors that avoid label update in each iteration (Line 7 9). A larger value of θ indicates that more neighbors are re-computing approach f , u is the reward unit for speed R s filtered out for update, thus leading to a higher confidence up and u is the reward unit for accuracy. a that the incremental algorithm is more efficient than a re- We next examine where the information loss of an in- computing approach. One nice property of Algorithm 2 is cremental algorithm comes. Basically, the accuracy loss of thatitcanboundthenumberofcandidateverticesrequiring the incremental algorithm comes from two parts: the loss labelassignmentupdateineachiteration. Inparticular,we onV− duetofixingtheirlabelassignments,andthelosson present the following theorem that provides the efficiency V+duetotheapproximatinglabelassignmentsforV+with guarantee for our incremental algorithm. fixed previous assigned labels for V−. Therefore, we have: Theorem 1 Ineachiteration,theprobabilityPcthataneigh- Loss(fI,fR)=D(Y|fR,Y|fI,V+)+D(Y|fR,Yo,V−) (13) bor of any candidate vertex requires label assignment update where Y|f denotes the label assignments using label in- is lower bounded by 1−θ. That is : P ≤1−θ. c ference operator f, Y denotes the previous assigned labels, o Proof (sketch): Let X denote a random variable, which andD(Y1,Y2,V)denotesthelabelassignmentdifferencesof represents the value of Y(u)TB Y(v) for any pair of vertices set V between two assignments Y1 and Y2. t(u)t(v) linkedvertices(u,v)betweent-typeverticesandt(cid:48)-typever- Unfortunately, it is non-trivial to examine differences be- tices. Then wtt(cid:48) is the average value of X, and σt2t(cid:48) is the tweenlabelassignmentsbyare-computingandthosebyan variance of X. incremental algorithm. This is a chicken and egg paradox: BasedonChebyshev’sinequality,ifXisarandomvariable one wants to decide which algorithm to use based on infor- withfiniteexpectedvalueuandfinitenon-zerovarianceσ2. mation loss (label assignments differences); while without Then for any real number q>0, we have: applying both algorithms, one can not get an accurate un- derstanding about the differences. In order to proceed, we 1 Pr(|X−u|≥qσ)≤ (10) estimate the information loss by examining the similarity q2 between old data and new data, which is inspired by the (cid:112) Let us replace u by wtt(cid:48), σ by σtt(cid:48), q by 1/(1−θ), we conceptdriftmodeling[12,45]forsupervisedlearning. The have: intuitionisthatifthedistributionofnewdatasignificantly Pr(|X−wtt(cid:48)|≥(cid:112)1/(1−θ)σtt(cid:48))≤(1−θ) (11) variesfromthatofolddata,anincrementalupdateresultsin higherinformationloss;whilethenewdataareverysimilar Eq.(11)exactlyexaminesthemaximumboundoftheprob- toolddata,anincrementalupdateleadstomuchlessinfor- ability of reaching Line 9 in Algorithm 2. Since the proba- mationloss. Specifically,weusethegraphproximityheuris- bility of reaching Line 9 in Algorithm 2, is identical to the ticdefinedinEq.(3)tosimulatethesimilaritybetweennew probabilitythataneighborofanycandidatevertexrequires data and old data, and consequently the information loss, label assignment update, we complete the proof. (cid:4) which leads to the following equations: In conclusion, the parameter θ roughly provides a utility that controls the estimated speed up of an incremental al- D(Y|fR,Y|fI,V+)=| avg sim(u,v,G∪∆G) gorithmtowardare-computingapproach. Alargervalueof u∈V−,v∈V+ (14) θ indicates that more neighbors are filtered out for update, − avg sim(u,v,∆G)| thus leading to a higher confidence that the incremental al- u∈V−,v∈V+ gorithmismoreefficientthanare-computingapproach. On another hand, a larger value of θ leads to fewer updates of D(Y|fR,Yo,V−)=| avg sim(u,v,G∪∆G) u∈VL,v∈V− neighbors and subsequently lower accuracy. From this per- (15) spective, the parameter θ allows us to choose a good trade- − avg sim(u,v,G)| u∈VL,v∈V− off between computational cost and solution accuracy. Thoughinformationlossmighthappenwhenusinganin- 5.2 ToRe-computeorNot? crementalalgorithm,muchlesstimeisconsumedincompu- In the online label inference, we continually receive new tation compared to any re-computing operation, especially dataand/ornewevidence(labels). Conditioningonthenew when the percentage of changes is very small. dataandnewevidence,wehavetwochoices: wecanrecom- putethelabelassignmentforallthevertices,usingfulllabel 6. EXPERIMENTS inference; or, we can fix some of the previous results, and only update a certain subset of the vertices using an incre- 6.1 DatasetsandSettings mentalalgorithm. Tounderstandtheconsequencesofusing Weevaluatetheproposedapproachesonfourrealdatasets anincrementalalgorithm,wemustanswerabasicquestion: withthreedifferentclassificationtasks: sentimentclassifica- how much accuracy loss are incurred and how much speed tion, topic classification and rating classification. Among upareachievedbyanincrementalalgorithmcomparedtoa the four datasets, Prop 30 and Prop 37 are two tripartite re-computing approach? graphscreatedfrom2012NovemberCaliforniaBallotTwit- We thus define the utility function for the incremental ter Data [42], each of which consists of tweet vertices, user algorithm f as follows: I vertices,andwordvertices. Thelabelclassesaresentiments: U(fI)=usGain(fI,fR)−uaLoss(fI,fR) (12) positive,negative,andneutral. ThePubMeddataset[24]is represented as a tripartite graph with three types of ver- where Gain(fI,fR) = (2|G−∪θ)∆|∆GG|| is the the computational tices: papers, reviewers, and words. Each paper/reviewer gain achieved by the incremental algorithm f compared is associated with multiple labels, which denote the set of I to the re-computing algorithm f , ∆G is the subgraph in- subtopics. The MovieLen dataset [31] represents the folk- R duced by the changed vertices ∆V, Loss(f ,f ) is the in- sonomy information among users, movies, and tags. Each I R formation loss of using incremental algorithm f instead of movie vertex is associated with a single class label. Here I 8 we use three coarse-grain rating classes, “good”, “neutral”, MRG ARG GRF MHV BHP and “bad” as the ground truth labels. A short description 0.8 ofLeaetchMdRatGa/seAtRisGsudmenmoateriztehdeinmuTlatbiplelic4a.tive/additive rule Accuracy 00..46 0.2 update with graph heuristic initialization. We compare our Prop 30 Prop 37 MovieLen PubMed approaches with three baselines: GRF [44], MHV [9] and MRG ARG GRF MHV BHP BHP [13]. GRF is the most representative traditional label 0.8 propagation algorithm (i.e., no B matrices or B matrices R 0.6 E are identity matrices), MHV is selected as a representative B 0.4 method that supports vertex-level heterogeneity (B matri- 0.2 ces are diagonal), and BHP denotes the label propagation Prop 30 Prop 37 MovieLen PubMed algorithmthatallowspropagation-levelheterophilyanduti- lizesasinglematrixB. Foralloftheseapproaches,webegin Figure 6: Classification quality comparisons. The with the same initialized state, and we use the same regu- higher accuracy (the lower balanced error rate), the larizations/or no regularizations for all approaches. Note better quality. that our goal in this paper is not to justify the perfor- manceofsemi-supervisedlearningfordifferentclassification tsauspkesrv(ivsaerdioluesarsnuirnvgeywsithhavfeewjuerstliafibeedletdhedaatdav)a,nbtuagteraotfhseermtio- 00000.....12345 x246811 0210−4 0000....011255 0000....00002468 0000....2468 0000....011255 propose a better semi-supervised label propagation algo- (a) Prop 30, MRG (b) Prop 37, ARG rithm for tripartite graphs. Therefore, we do not compare opuoWrrtaevpeepvcratoolaurcahmteeastchwheiinteheff.oecthtievrensuespserovfiesaedchmaeptphroodaschsuinchtearsmssupo-f 000...123 000...123 000...123 00000....00001234 000011.....24682 0011..55 (c) MovieLen, ARG (d) PubMed, MRG classificationaccuracy. Specifically,weselect[1%,5%,10%] ofverticeswithgroundtruthlabelsasthesetofseedlabeled vertices,andthenrundifferentlabelpropagationalgorithms Figure 7: The automatically learned propagation overtheentirenetworktolabeltheremainingvertices. Note matrices B by the better algorithm between ARG thattheselectionofseedlabeledverticesisnotthefocusof and MRG (Best viewed in color). this work, and thus we simply adopt the degree centrality to select the top [1%, 5%, 10%] of vertices as seed nodes. 6.2 StaticApproachEvaluation For both single- and multi-class labels, we assign label l to i a vertex u if Y(u,l ) > 1/k and validate the results with Inthissection,weevaluatetheperformanceofAlgorithm1 i ground truth labels. We represent classification results as a in terms of convergence, efficiency and classification accu- contingency matrix A, with A for i,j ∈ L = {l ,··· ,l } racy. ij 1 k wherekisthenumberofclasslabels,andA isthenumber ij oftimesthatavertexoftruelabell isclassifiedaslabell . Question 1 Accuracy: HowdoesAlgorithm1performcom- i j With the contingency matrix A, the classification accuracy pared to the baselines? is defined as: Accuracy= (cid:80)(cid:80)iijAAiiij. Theclassificationaccuracycannotwellevaluatetheper- Result 1 Algorithm 1 outperforms the baselines in terms of formance if label distribution is skewed. Therefore, we also classificationaccuracyandbalancederrorratewhendataexhibit use Balanced Error Rate [30] to evaluate the classification heterophilyand/orwithmulti-classlabels. quality. The balanced error rate is defined as BER = 1− Fig.6reportstheclassificationaccuracyandbalanceder- k1(cid:80)i (cid:80)AjiAiij. rcoornvraertgeecnocmeptaorleisroannscefovraallulethteoa1p0p−r6o.acThehse. Hpaerraemweetesretβthies set to 5. This is because in our preliminary experiments, Table 4: The statistics of datasets. “single label” we notice that the accuracy increases when β is increased denotes whether each vertex can be associate with from1to5,butafterthat,increasingβ doesnotleadtosig- one or more labels; “max label” denotes the class nificant changes in accuracy. Clearly, our approaches MRG label l that has maximum number of vertices and andARG,performsmuchbetterthantheotherapproaches m “% max label” denotes the percentage of vertices onProp30,MovieLen,PubMed,andperformsimilarlywith that have ground truth label l . GRF on Prop 37. The results validate the advantage of in- m Data Prop30 Prop37 MovieLen PubMed ferred B matrix in terms of supporting vertex-level hetero- #nodes 23,025 62,383 26,850 36,813 geneity, propagation-level heterogeneity and the multi-class #edges 168,131 542,028 168,371 263,085 label propagation. #classes 2 2 3 19 For example, Prop 37 is about labeling genetically engi- singlelabel Yes Yes Yes No neeredfoods, andthemajorityofpeoplehavepositiveatti- %na 3.6% 3.1% 14.9% 0.3% tude. AsshowninFig.7(b),alloftheBmatriceslearnedby %n 45.7% 54.5% 56.8% 0.6% b %nc 50.7% 42.4% 28.3% 99.1% ARG are diagonal matrices, which illustrates that the link %maxlabel 67% 90% 67% 23% formation exactly follows homophily assumption. Hence, on Prop 37, forcing B matrices as identity matrices such All the algorithms were implemented in Java 8 in a PC as GRF obtains comparable quality with our approaches. with i7-3.0HZ CPU and 8G memory. Moreover,althoughbothPubMed(seeFig.7(d))andProp 9 MRG ARG GRF MHV BHP ms) 10000 MRG ARG GRF MHV BHP √√alues (J/J)r0 0000 0....9999 .192468 √√alues (J/J)r0 00000.....99999 156789 Time per iteration ( 1000 e v 0.88 e v 0.94 Prop 30 Prop 37 MovieLen PubMed Objectiv 000...888246 Objectiv 000...999123 me (s) MRG ARG GRF MHV BHP 0 (a N2)0umPbre 4or0 opf ite3 6ra00tion s8 0r 100 0(b N2)0umPbre 4or0 opf ite 36ra07tion s8 0r 100 Total running ti 1 1000 Prop 30 Prop 37 MovieLen PubMed √√es (J/J)r0 000 ...1789 √√es (J/J)r0 0 .19 Fitiegruarteio9n: iEsffiinciemncilylisceocmonpdarisscoanle. Nanodtettohtaatltrimunenpinegr e valu 00..56 e valu 0.8 time is in second scale. ectiv 00..34 ectiv Obj 0.2 0 20 40 60 80 100 Obj 0.7 0 20 40 60 80 100 running time in Fig. 9. Because on average they converge Number of iterations r Number of iterations r 2-3 times faster than all the baselines, computationally ex- (c) MovieLen (d) PubMed pensiveperiterationduetotheincurredadditionalcostfor computing the propagation matrices B, the proposed algo- rithms are still very efficient for large-scale data. Figure 8: Convergence comparison. y axis: the ra- √ √ tio between J and J , where J is the objective r 0 r value for rth iteration and J0 is the objective value Table 5: The effect of graph regularization. of initialization. Prop30 Prop37 w w/o w w/o MRG 0.869 0.848 0.935 0.908 37 exactly follow homophily assumption, the proposed ap- ARG 0.867 0.781 0.942 0.928 proachesperformbetterthanotherapproachesonPubMed. This is because PubMed has multi-class labels and our ap- proaches well support multi-class label propagation com- Question 4 Regularization: Whatistheeffectofthegraph pared to other approaches. On Prop 30 and MovieLen, the regularization? B matricesareamixtureofdifferentformsofmatrices. Un- der this situation, our approaches MRG and ARG perform betterthanalltheapproachesincludingBHP(usingasingle Result 4 We validate that graph regularization term is arbitrary B matrix). helpful for sentiment classification tasks on Prop 30 and Prop 37. Question 2 Convergence: Do multiplicative and additive Intuitively user-to-user graph (e.g., friendship graph) or rules in Algorithm 1 converge? document graph (e.g., citation graph) will be very helpful for labeling users or documents. Unfortunately, we do not Result 2 Bothmultiplicativeandadditiverulesareguar- havesuchgraphsforMovieLenandPubMed. Therefore,al- anteed to converge into local optima and their convergence though our framework is very general and supports various rate in terms of objective values are much faster than the regularization, we only compare the classification accuracy baselines. w/o graph regularization on Prop 30 and prop 37 (i.e., the Instead of fixing convergence tolerance value, now we fix two data sets that have additional user to user re-tweeting themaximumiterationnumberto100,andvalidatethecon- graphs). With the additional regularization, the classifica- vergence performance of the proposed algorithms. Fig. 8 tionaccuracyincreasesby6.1%onProp30and2%onProp shows that the objective values are non-increasing using 37. both multiplicative rules and additive rules. In addition, Question 5 MRG V.S. ARG: Is MRG perform better the proposed algorithms decrease the objective value much than ARG, or vice verse? faster than other algorithms. Question 3 Efficiency: Are the proposed algorithms scal- able? Result 5 The two update rules exhibit similar behaviors in terms of accuracy, convergence and running time. Interestingly,weobservethatthereisnoclearwinnerbe- Result 3 Ineachiteration,theproposedalgorithmsMRG tween the two update rules. The result demonstrates that and ARG require more computational cost than the base- ourunifiedalgorithmcanserveasaframeworkforcompar- lines. However, since they converge much faster than other ison between different update rules. approaches, the total running time of the proposed algo- rithms, are still faster than the baselines. 6.3 IncrementalApproachEvaluation Weagainfixtheconvergencetolerancevalueto10−6,and We justify the advantage of our incremental approaches report the average running time per iteration and the total in terms of guaranteed speed up and decent classification 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.