Table Of ContentLabel Propagation on K-partite Graphs with Heterophily
Dingxiong Deng Fan Bai Yiqi Tang
Dept. ofComputerScience SchoolofComputerScience SchoolofComputerScience
Univ. ofSouthernCalifornia FudanUniversity FudanUniversity
dingxiod@usc.edu
Shuigeng Zhou Cyrus Shahabi Linhong Zhu
SchoolofComputerScience Dept. ofComputerScience InformationSciencesInstitute
FudanUniversity Univ. ofSouthernCalifornia Univ. ofSouthernCalifornia
sgzhou@fudan.edu.cn shahabi@usc.edu linhong@isi.edu
7
1
u u
0 ABSTRACT 1 2 user
2
Inthispaper, forthefirsttime, westudylabelpropagation homophily
n inheterogeneousgraphsunderheterophilyassumption. Ho-
a mophily label propagation (i.e., two connected nodes share t1 t2 t3 tweet
J similar labels) in homogeneous graph (with same types of heterophily
1 vertices and relations) has been extensively studied before.
2 Utaninfordtiuffnearetnelty,tryepaels-liofefnveetrwtiocrekss(aer.eg.h,etuesreorgs,eniemoaugse,st,hetyexctosn)- w1 w2 w3 w4 word
] and relations (e.g., friendships, co-tagging) and allow for
G
each node to propagate both the same and opposite copy Figure 1: An example of sentiment labeling on a
L of labels to its neighbors. We propose a K-partite label heterogeneous network with three types of vertices:
. propagation model to handle the mystifying combination users, tweets and words. Solid edges denote ob-
s
c of heterogeneous nodes/relations and heterophily propaga- servedlinks, anddashededgesrepresentlabelprop-
[ tion. With this model, we develop a novel label inference agation directions. The width of dash edges dis-
algorithm framework with update rules in near-linear time tinguishes strengths of propagation. Green color:
1 complexity. Sincerealnetworkschangeovertime,wedevise positive label; red color: negative label; homophily:
v an incremental approach, which supports fast updates for propagate the same copy of labels; and heterophily:
5
bothnewdataandevidence(e.g.,groundtruthlabels)with propagate the opposite copy of labels.
7
guaranteed efficiency. We further provide a utility function
0
toautomaticallydeterminewhetheranincrementalorare-
6
modeling approach is favored. Extensive experiments on ent algorithms [11, 19, 33, 39, 41, 44] have been proposed
0
realdatasetshaveverifiedtheeffectivenessandefficiencyof to perform label propagation on trees or arbitrary graphs.
.
1 our approach, and its superiority over the state-of-the-art All of these traditional algorithms simply assume that all
0 label propagation methods. vertices are of the same type and restricted to only a single
7 pairwise similarity matrix (graph).
1 Unfortunately, many real networks such as social net-
: 1. INTRODUCTION works are heterogeneous systems [3, 32, 38] that contain
v
i Label propagation [44] is one of the classic algorithms objects of multiple types and are interlinked via various re-
X to learn the label information for each vertex in a network lations. Consideringtheheterogeneityofdata,itistremen-
r (or graph). It is a process that each vertex receives labels douslychallengingtoanalyzeandunderstandsuchnetworks
a fromneighborsinparallel,thenupdatesitslabelsandfinally throughlabelpropagation. Applyingtraditionallabelprop-
sendsnewlabelsbacktoitsneighbors. Recently,labelprop- agation approaches directly to heterogeneous graphs is not
agation has received renewed interests from both academia feasible due to the following reasons. First, traditional ap-
and industry due to its various applications in many do- proaches neither support label propagation among differ-
mains such as in spam detection [1], fraud detection [10], ent types of vertices, nor distinguish propagation strengths
sentiment analysis [15], and graph partitioning [35]. Differ- among various types of relations. Consider the example of
sentimentlabelingshowninFig.1,thelabelofeachtweetis
estimated using the labels of words and users. In addition,
the label information of users is much more reliable than
that of words in terms of deciding the labels of tweets [42],
and thus the user vertices should have stronger propaga-
tion strengths than word vertices. Second, traditional ap-
proaches do not support heterophily propagation. As illus-
trated in Fig. 1, in sentiment label propagation, traditional
approaches assume that if a word has a positive label, then
its connected tweets also have positive labels. However, it
1
Table 1: A summary for label propagation meth-
words documents authors
data ods. H-V denotes whether a method supports het-
cleaning erogeneous types of vertices, H-P denotes whether
a method supports heterophily in label propaga-
mining
tion, Auto means whether the heterophily matrix is
graph
items
automatically learned or predefined, Incre denotes
join
whether the proposed label propagation algorithm
query
supports incremental update upon new data or new
(a) Folksonomy (b) Text corpus
labels, and Joint denotes whether the proposed ap-
proachallowsavertextobeassociatedwithmultiple
Figure 2: Examples of ubiquitous tripartite graphs. labels.
Method H-V H-P Auto Incre Joint
is reasonable that a tweet connected to a positive word is [4] [11] [33] [34]
X X X X X
negativeduetosarcasm.Therearefewnotableexceptions[9, [39] [41] [44]
13, 20] (see more in related works) that support either het- [7] X X X X (cid:88)
erogeneoustypesofverticesorheterophilypropagation,but [9] [20] (cid:88) X X X ?
not both. Last but not least, all of the current approaches [13] [37] X (cid:88) X (cid:88) X
simplyassumethatthepropagationtype(e.g.,ahomophily Proposed (cid:88) (cid:88) (cid:88) (cid:88) (cid:88)
orheterophilypropagation)isgivenasaninput,thoughac-
tually this is difficult to obtain from observation.
theconfidencelevelparameterforspeedup,butalsocande-
In this paper, we study the problem of labeling nodes
terminewhetherweshouldapplyourincrementalalgorithm
withcategories(e.g.,topics,communities,sentimentclasses)
through a utility function.
in a partially labeled heterogeneous network. To this end,
The contributions of this work are as follows:
we first propose a K-partite graph model as the knowledge
representation for heterogeneous data. Many real-life data
1. Problem formulation: we show how real-life data
examples, naturally form K-partite with different types of
analytic tasks can be formulated as the problem of
nodes and relations. To provide a few examples, as shown
label propagation on K-partite graphs.
inFig.2,infolksonomysystem,thetripletrelationsamong
users, items and tags can be represented as a tripartite 2. Heterogeneityandheterophily: weproposeageneric
graph; document-word-author relation can also be modeled label propagation model that supports both hetero-
as another tripartite graph with three types of vertices. A geneity and heterophily propagation.
K-partite graph model nicely captures the combination of
vertex-levelheterogeneityandheterogeneousrelations,which 3. Fast inference algorithms: wedevelopaunifiedla-
motivates us to focus on label propagation on K-partite bel propagation framework on K-partite graphs that
graphs. That is, given an observed K-partite and a set of supports both multiplicative and addictive rules with
veryfewseedverticesthathaveground-truthlabels,ourgoal near-linear time complexity. We then propose an in-
is to learn the label information of the large number of the crementalframework,whichsupportsmuchfasterup-
remainingvertices. EventhoughK-partitegraphsareubiq- dates upon new data and labels than re-computing
uitous and the problem of label propagation on K-partite from scratch. In order to strike a balance between ef-
graphs has significant impact, this area is much less stud- ficiency and effectiveness, we introduce a confidence
ied as compared to traditional label propagation in homo- level parameter for speed up. We further develop a
geneous graphs and thus various modeling and algorithmic utilityfunctionisdesignedtoautomaticallychoosebe-
challenges remain unsolved. tween an incremental or a re-computing approach.
To address the modeling challenges, we develop a unified
4. Practical applications: we demonstrate with three
K-partite label propagation model with both vertex-level
typicalapplicationexamplesthatvariousclassification
heterogeneityandpropagation-levelheterogeneity. Consider
tasksinrealscenarioscanbesolvedbyourlabelprop-
the sentiment example in Fig. 1, our model allows a tweet
agation framework with K-partite graphs.
vertex to receive labels from both words and users, but au-
tomatically gives higher weights to the latter through our
2. RELATEDWORKS
propagation matrix. The propagation-level heterogeneity,
whichisreflectedbysupportinghomophily,heterophilyand In this section, we first provide an extensive (but not ex-
mixedpropagation,areautomaticallylearnedinourmodel. haustive) review about the state-of-the-art label propaga-
To infer our model, we first propose a framework that sup- tion approaches, and then discuss related works on hetero-
ports both multiplicative [25] and addictive rules [18] (i.e., geneous graph representation using K-partite graphs.
projected gradient descent) under the vertex-centric man-
2.1 LabelPropagation
ner with near linear-time complexity. Because in practice
graph and labels can continuously changing, we then study We summarize a set of representative label propagation
how and when we should apply label propagation to han- algorithms in Table 1. In the following, we provide more
dlethechangingscenarios. Wethusdeviseafastincrement details about each approach, with emphasis on their ad-
algorithm(i.e.,assigninglabelsfornewday,orupdatingla- vantages and disadvantages. First we consider how these
belsuponfeedbacks),thatperformspartialupdatesinstead methods support propagating labels in heterogeneous net-
of re-running the label inference from scratch. We not only works. Various types of algorithms have been proposed to
cancontrolthetrade-offbetweenefficiencyandaccuracyvia performlabelpropagationontreesorarbitrarygraphssuch
2
asbeliefpropagation[11][39],loopybeliefpropagation[19], structures (the local cluster structures and the global com-
Gaussian Random Field (GRF) [44], MP [33], MAD [34], munity structures) from a K-partite graph; Zhu et al. [42]
and local consistency [41]. All of these algorithms simply addressed both static tripartite graph clustering and online
assume that all vertices are of the same type and restricted tripartite graph clustering with matrices co-factorization.
to only a single graph. There are other works which study theoretical issues such
Recently with the tremendously increasing of heteroge- as competition numbers of tripartite graphs [22].
neous data, label propagation approaches on heterogeneous Tosummarize,K-partitegraphmodelingandanalysishas
networks have been developed. Jacob et al. [20] focused beenstudiedfromdifferentperspectivesduetoitspotential
on learning the unified latent space representation through in various important applications. Yet studies on learning
supervised learning for heterogeneous networks. Ding et with K-partite graph modeling are limited. In this work,
al.[9]proposedacrosspropagationalgorithmforK-partite we formulate a set of traditional classification tasks such
graphs, which distinguishes vertices of different types, and as sentiment classification, topic categorization, and rating
propagates label information from vertices of one type to prediction as the label propagation problem on K-partite
vertices of another type. However, they still make the ho- graphs. With the observed K-partite graphs, our label in-
mophily propagation assumption, that is, two connected ferenceapproachisabletoobtaindecentaccuracywithvery
nodeshavesimilarlabels(representations),eventhoughthey few labels.
are from different types. On another hand, Gatterbauer
et al. [13] proposed a heterophily belief propagation algo- 3. PROBLEMFORMULATION
rithm for homogeneous graphs with the same type of ver-
Lett,t(cid:48) beaspecificvertextype. AK-partitegraphG=
tices(e.g.,ausergraph). Yamaguchietal.[37]alsoproposed
a heterophily propagation algorithm that connects to ran- < ∪Kt=1Vt, ∪Kt=1 ∪Kt(cid:48)=1 Ett(cid:48) > has K types of vertices, and
contains at most K(K−1)/2 two-way relations. We also
domwalkandGaussianRandomField. Inbothapproaches,
use notation G to denote the adjacency matrix representa-
they assumed that even a user node is labeled as fraud, it
cannotsimplypropagatethefraudlabeltoallitsneighbors. tion of a K-partite graph and use Gtt(cid:48) to denote the sub
graph/matrix induced by the set of t-type vertices V and
However, their approach does not support vertex-level het- t
erogeneity and use the same type of propagation over the the set of t(cid:48)-type vertices Vt(cid:48). For ease of presentation, Ta-
ble 2 lists the notations we use throughout this paper.
entire network. In addition, the propagation matrices (e.g.,
Intuitively,thepurposeoflabelpropagationonK-partite
diagonal matrix for homophily, off-diagonal matrix for het-
graph is to utilize observed labels of seed vertices (denoted
erophily) are required to be predefined based on observa-
as VL) to infer label information for remaining unlabeled
tion,ratherthantobeautomaticallylearned. Inthiswork,
vertices (denoted as VU). For each vertex v, we use a col-
we propose a unified label inference framework, which sup-
umn vector Y(v) ∈ Rk×1 to denote the probabilistic label
portsbothvertex-levelheterogeneity,andpropagation-level
assignment to vertex v and a matrix Y ∈ Rn×k to denote
heterophily (i.e., different types of propagation across het-
theprobabilisticlabelassignmentforallthevertices,where
erogeneous relations). Furthermore, our framework is able
n is number of nodes and k is number of labels. Therefore,
to automatically learn the propagation matrices from the
ourobjectiveistoinfertheY matrixsothateachunlabeled
observed networks.
vertex v∈VU obtains an accurate estimation of its ground
Currentresearcheshavebeenfocusedonaddressingalgo-
truth label.
rithmic issues in the problem of label propagation such as
We first build intuition about our model using a running
incrementally and jointly learning for label inference. Gat-
example shown in Fig. 3(a). Suppose we only observe the
terbauer et al. [13] proposed a heuristic incremental belief
links and very few labels of vertices as shown in Fig. 3(b),
propagation algorithm with new data. However, more re-
thepurposeoflabelpropagationontripartitegraphsisthen
search is required in finding provable performance guaran-
to utilize observed links to infer label information. We in-
tee for the incremental algorithm. Chakrabarti et al. [7]
troduce a matrix B ∈ Rk×k to represent the correlation
proposed a framework with joint inference of label types
between link formation and associated labels. Each entry
such as hometown, current city, and employers, for users
B(l ,l ) denotes the likelihood that a node labeled as l is
connected in a social network. To advance existing work, i j i
connectedwithanothernodelabeledasl . Subsequently,in
weproposeanincrementalframeworkwhichsupportsadap- j
the label propagation process, each entry B(l ,l ) also de-
tive update upon both new data and labels. In addition, i j
notesaproportionalpropagationthatindicatestherelative
our algorithm is guaranteed to achieve speedup compared
influence of nodes with label l on another nodes with label
to re-computing algorithms with a certain confidence. It i
l . To keep consistent with the label propagation process,
also supports multi-class label joint inference and provides j
here we interpret B as the propagation matrix.
a better understanding about the latent factors that cause
Note that the B matrix is an arbitrary non-negative ma-
link formations in the observed heterogeneous networks.
trix without any restrictions. If the B matrix is diagonal
(e.g., B from V to V in Fig. 3(d)), the link formation is
2.2 K-partiteGraphs consisteanct with haomophcily assumption; while if the B ma-
K-partite graph analysis has wide applications in many trix is off-diagonal, the link formation is more likely due to
domainssuchastopicmodeling[27],communitydetection[29], heterophily assumption. In addition, as shown in Fig. 3(d),
andsentimentanalysis[42]. Mostoftheseworksusetripar- thepropagationmatrixbetweena−andb−typeverticesis
tite graph modeling as a unified knowledge representation different from that between a− and c− type vertices. We
forheterogeneousdata,andthenformulatetherealdataan- thus let Btt(cid:48) denote the propagation matrix between t-type
alytictasksasthecorrespondingtripartitegraphclustering vertices and t(cid:48)-type vertices.
problem. For example, Long et al. [27] proposed a general Generallyspeaking,ifwehavepriorknowledgeaboutprop-
model, the relation summary network, to find the hidden agation matrix B, then each unlabeled vertex could receive
3
Va 1 2 Va 1 2 Table 2: Notations and explanations.
Vb 3 4 5 6 Vb 3 4 5 6 Notations Explanations.
n/m/k/K numberofnodes/edges/labels/types
Vc 7 8 Vc 7 8 G/B/Y Graph/Propagation/Labelassignmentmatrix
t(v)/d(v)/N(v) thetype/degree/neighborsofvertexv
(a) Ground truth (b) Observation Y(v)∈Rk×1 thelabelassignmentforvertexv
asubmatrixofmatrixM betweent-type
Mtt(cid:48) verticesandt’-typevertices
3 7 Va to Vb Va to Vc Vc to Vb η stepsizeinadditiverule
g r g r g r (cid:15) averysmallpositiveconstant
green 1 2 4 6 58 gr 10..55 02..55 gr 10 01 gr 11..55 12.5 θ1LA(x) piLnaidpriasccmahteiottzrerfcuoinnncsittniaocnrntementalalgorithm
red J(x) Objectivefunctionofx
(c) Label embedding (d) Example propagation matrics J(x) Largrangianfunctionofx
Figure 3: An illustration of our label inference process challengesremainingtobesolved. First,theNP-hardnessof
with a tripartite graph example. Here we use different Problem 1 (the sub problem of nonnegative matrix factor-
shapes/colors to distinguish types/labels. g: green, and izationisNP-hard[36])requiresefficientsolutionsforlarge-
r: red. Inaddition,thebluecolordenotesanoverlapping scale real problems. Second, the rapid growth of data and
assignment of both green and red. feedbacksrequiresfastincrementalupdate. Inthefollowing,
weaddressthosecomputationalchallengesbydevelopingef-
ficient algorithms that are highly scalable and achieve high
proportional propagation from very few labeled vertices ac- qualityintermsofclassificationaccuracyforreal-lifetasks.
cording to B. We propose to infer the propagation matrix
andlabelassignmentviaembedding. Weembedeachvertex 4. LABELINFERENCEALGORITHMS
intoaunifiedspace(foralltypesofvertices),whereeachdi-
In this section, we discuss the algorithmic issues in la-
mension of the latent space denotes a specific label and the
bel inference process. Specifically, our label inference prob-
linkprobabilityoftwoverticesarecorrelatedtotheirlabels
lem leads to an optimization problem, which searches for
(i.e., distance in the latent space). Fig. 3(c) shows a sim-
the optimum weighted propagation matrix B and label as-
plifiedtwo-dimensionspace. Withthelabelembedding,the
signment matrix Y to minimize the objective function in
label assignment and the correlation can be automatically
Eq. (1). Considering these nonnegative constraints, differ-
learned.
entapproaches[18,26,21]havebeenproposedtosolvethis
Withtheseintuitions,wefocusonthefollowingproblem:
non-convexoptimizationproblem. Amongthem,themulti-
Problem 1 (K-partite Graph Label Inference) Given
plicative update [25] and additive update rules [26] are two
a K-partite graph, a set of labels L, a set of seed labeled
mostpopularapproachesbecauseoftheireffectiveness. The
vertices VL with ground truth labels Y∗ ∈ RnL×k (nL = multiplicativeupdateapproachbatchupdateseachentryof
|VL|),thegoalistoinferthelabelassignmentmatrixY and
thematrixbymultiplyingapositivecoefficientateachiter-
the propagation matrix B such that
ation, while the additive update approach is a project gra-
argYm,Bi≥n0{(cid:88)(cid:88)(cid:107)Gtt(cid:48) −YtBtt(cid:48)YtT(cid:48)(cid:107)2F disiennotedxeisscteinngt mwoetrhkotdh.atTocotmhebibneesstdoifffeoruerntknuopwdlaetdegeru,ltehserine
t t(cid:48)(cid:54)=t
(1) a unified framework.
+β (cid:88) (cid:107)Y(u)−Y∗(u)(cid:107)2F +λregularizer(G,Y)} In this paper, we propose to combine these two update
u∈VL rules in a unified framework. We observe that the typical
multiplicative update rule that batch updates each entry
where t denotes the type of vertex, Y denote a sub matrix
t
of the matrix, can be transferred to a vertex-centric rule
of Y, which gives the label assignment for the set of t-type
that corresponds to update each row of the matrix per it-
vertices V , β and λ are parameters that control the contri-
t
eration. This transformation allows us to unify both rules
bution of different terms, and regularizer(G,Y) denotes
under the same vertex-centric label propagation framework
a regularization approach such as graph regularization [5],
because many addictive rules are updating each row per it-
sparsity [14], diversity [40], and complexity regularization.
erations [26, 21]. We notice that multiplicative and addic-
In our objective function, the first term evaluates how tiveupdaterulessharesomecommoncomputations. Conse-
well each B matrix represents the correlation between link quently by pre-computing these common terms, our frame-
formation G and associated labels Y, via computing the work can be independent to various different update rules,
error between estimated link formation probability and the while remains as efficient as possible. The proposed frame-
observedgraph. Thesecondterm(cid:80) (cid:107)Y(u)−Y∗(u)(cid:107)2 gives work enjoys three important by-products: (1) supporting
u F
penalty to seed vertices if their learned labels are far away faircomparisonbetweendifferentupdaterules,(2)oneuni-
from the ground truths. Note that the regularization term fied incremental algorithm in Section 5 naturally support
providesanadd-onpropertythatutilizesadditionaldomain- various update rules, (3) easy to parallel because the up-
specific knowledge to improve learning accuracy. dates of each vertex can be performed at the same time.
In conclusion, our problem definition has well addressed Algorithm1 presentsthe frameworkof ourvertex-centric
all the mentioned modeling challenges. Unfortunately, be- labelinferencealgorithm,whichconsistsofthreesteps: ini-
sides modeling challenges, there are several computational tialization, update for the propagation matrix B, and up-
4
Algorithm 1 The unified label inference framework We first compute the sim(u,v,G) with Eq. (3), and then
Input: Graph G, a few ground truths Y∗ normalize all the scores into the range [0,1]. Here d(u) is
Output: Label Matrix Y the degree of vertex u, and N(u) is the set of neighbors of
01: Initialize Y and B (see Section 4.1) vertex u.
02: repeat Initializing BForeachBtt(cid:48),weinitializeitbasedonlabel
03: update Btt(cid:48) (see Section 4.3) class information and vertex type information of the ob-
04: update common terms At (see Eq. (6)) served labeled vertices. Specifically, we first initialize each
/∗ Vertex-centric search (block coordinate search)∗/ Btt(cid:48) as an identity matrix, where we assume that if type-t
05: for each vertex u∈V vertices are in l class, then all their connected type-t(cid:48) ver-
i
06: update and/or normalize Y(u) (see Section 4.2) ticeswillreceivecorrespondingl classlabel. Wethenincre-
i
07: Ys =Y; mentBtt(cid:48)(li,lj)byonewheneverweobserveatype-tvertex
08: Y =argminY ||Ys−Y||2F +λregularizer(G,Y) labeled li is connected to another type-t(cid:48) vertex labeled lj.
09: until converges Finally, we normalize each Btt(cid:48) using L1 norm.
10: return Y
4.2 UpdateY
As introduced earlier, we perform vertex-centric update
date for the label assignment of each vertex Y(u). We first for Y. That is, in each iteration, we focus on minimizing
provide a non-negative initialization for both Y and B in the following sub objective:
Section4.1. WenextiterativelyupdateB (Section4.3)and
Y (Section 4.2) until the solution converges (Lines 2–10) J(Y(u))= (cid:88) (G(u,v)−Y(u)TBt(u)t(v)Y(v))2
with both multiplicative and addictive rules. Because the v∈N(u)
computational cost is dominated by updating the label as- + (cid:88) (Y(u)TB Y(v))2 (4)
t(u)t(v)
signments for all the vertices (Lines 5–6). To reduce the v(cid:54)∈N(u),t(v)(cid:54)=t(u)
computational cost, when updating label assignment ma- +1 (u)β(cid:107)Y(u)−Y∗(u)(cid:107)2
trixY inSection4.2,wedesignanefficientcachetechnique VL F
that pre-computes and reuses common terms shared by the where1A(x)isthe indicator function, whichisoneifx∈A
same type vertices (i.e., A for each t-type vertex) for both and zero otherwise, N(u) is the neighbors of vertex u.
t
updaterules. Weshowthatbypre-computingA ,thecom- Inthefollowing,weadopttworepresentativeupdaterules,
t
putationaltimeforeachY decreases,consequentlythecom- multiplicative rule and additive rule, to derive the optimal
t
putational cost per iteration is much reduced. solution for Eq. (17).
Inthefollowingwefirstpresentthedetailsofthreecompo-
Lemma 1 (MultiplicativeruleforY)Y canbeapprox-
nents in our unified algorithm framework from Section 4.1–
imated via the following multiplicative update rule:
Section4.3,wethenshowtheequivalencebetweenelement-
wise multiplicative and vertex-centric multiplicative, and Y(u)=Y(u)◦
analyze the theoretical properties of different update rules (cid:115)(cid:80) G(u,v)B Y(v)+β1 (u)Y∗(u) (5)
in Section 4.4. v∈N(u) t(u)t(v) VL
A Y(u)+β1 (u)Y(u)+(cid:15)
4.1 Initialization t(u) VL
where (cid:15) > 0 is a very small positive value (e.g., 1−9), and
Toachieveabetterlocaloptimum,alabelinferencealgo-
A is defined as follows:
rithm should start from one or more relatively good initial t
ginuietisasleisz.atIionntfhoirslwaboerkl,aswsiegnfomcuenstomnagtrraixphY;pwrohxiilmeiBtymbaatsreidx At= (cid:88) Btt(v)Y(v)Y(v)TBtTt(v) (6)
v(cid:54)∈Vt
is initialized using observed label propagation information
among labeled seed vertices. Proof: Theproofcanbederivedinspiritoftheclassicmul-
tiplicativealgorithm[25]forNon-negativematrixfactoriza-
Initializing Y Given the set of labeled vertices VL with
tion with the KKT condition [23]. Details are presented in
ground truth label Y∗ ∈ Rnl×k, we utilize the graph prox-
Appendix 7.1.
imitytoinitializetheunlabeledverticeswithsimilarlabeled
and the same-type vertices. Specifically, the label assign- Lemma 2 (Additive rule for Y) An alternative approx-
ment matrix Y0 can be initialized as follows: imatesolutiontoY canbederivedviathefollowingadditive
Y∗(u) ifu∈VL rule:
Y0(u)= avg sim(u,v,G)Y(v) otherwise (2) Y(u)r+1 =max((cid:15),Y(u)r+2η( (cid:88) G(u,v)Bt(u)t(v)Y(v)
v∈Vt(u) v∈N(u)
wheresim(u,v)evaluatesthegraphproximitybetweenver- −AtY(u)+β1VL(u)(Y∗(u)−Y(u)r)))
tices u and v. In our experiments, we define sim(u,v,G) (7)
as the normalized Admic-Adar score [2]. In order to evalu- where η is the step size, and At is defined in Eq. (6).
atethesimilaritybetweentwovertices,wesumthenumber
Proof: It can be easily derived by replacing the deviation
of neighbors the two vertices have in common. Neighbors
of J(Y(u)) into the standard gradient descent update rule.
that are unique to a few vertices are weighted more than
commonly occurring neighbors. That is, Stepsizeforadditiverule. WeuseNesterov’smethod[17],[28]
and [43] to compute the step size η, which can be esti-
(cid:88) 1
sim(u,v,G)= (3) mated using the Lipschitz constant L for ∇J(Y(u)), see
logd(w)
Appendix 7.2.
w∈(N(u)∩N(v))
5
2 and 3 [21], any limited point of the sequence generated
Table 3: Time complexity of basic operators, where
by Algorithm 1 reaches the stationary point if the update
n is number of nodes, m is number of edges, and k
rules remain non-zero and achieve optimum. Therefore, in
is number of label classes.
general, if any update rule in Algorithm 1 is optimum for
B Y A
Multi O(n+m)k O(n+m)k (cid:80)t(cid:48)(cid:54)=ttnt(cid:48)k2 boWthesunbexotbajnecatliyvzees,tihtelecaodnsvetrogtehnecestpartoiopneartriyespowinhte.n using
Addti O(n+m)k O(n+m)k (cid:80)t(cid:48)(cid:54)=tnt(cid:48)k2 both multiplicative update rules and additive update rules.
Although both the multiplicative updating rules and addi-
tive rules used in this work are not optimum for subprob-
4.3 UpdateB lems, they still have very nice convergence properties. As
In the following, we present the detailed update rules for proved on Theorem 2.4 [6], the proposed additive rules still
propagation B. convergeintoastationarypoint. Forthemultiplicativerule,
we conclude that using the proposed multiplicative updat-
Lemma 3 (Multiplicative rule for B) B can be derived
ing rules guarantees that the value of objective function is
via the following update rule:
non-increasing, and thus the algorithm converges into a lo-
(cid:115) caloptima. Thisisbecausetheproposedmultiplicativerules
Btt(cid:48) =Btt(cid:48) ◦ YtTYYtBtTtGt(cid:48)Yttt(cid:48)T(cid:48)YYt(cid:48)t(cid:48) +(cid:15) (8) ainreLiedmenmtiaca6l,taontdheZthruadeittiaoln.a[4l2m]uhlatvipelipcraotviveedrtuhleastatsheprvoavleude
of objective function is non-increasing with the traditional
Proof: Proof of this Lemma is similar to Lemma 1, as multiplicative rules.
shown in Appendix 7.3.
5. INCREMENTALLABELINFERENCE
Lemma 4 (Additive rule for B) An alternative approx-
imatesolutiontoB canbederivedviathefollowingadditive In this section, we present our solution to the fundamen-
rule: talresearchquestionwithpracticalimportance: How can we
support fast incremental updates upon graph updates
Btt(cid:48) =max((cid:15),Btt(cid:48) +2ηb(YtTGtt(cid:48)Yt(cid:48) −YtTYtBtt(cid:48)YtT(cid:48)Yt(cid:48))) such as new labels and/or new node/edges? Thisisbe-
(9)
cause in practice graphs and labels are continuously chang-
where η again denotes the step size.
b ing. In the following, we first develop an incremental algo-
rithm that adaptively updates label assignment upon new
Proof: It can be easily derived by replacing the deviation
data, where we can control the trade-off between efficiency
of J(Btt(cid:48)) into the standard gradient descent update rule. and accuracy. We then further explore another interesting
Similar to the computation of η with Nesterov’s gradient
question: on which condition it is faster to perform
method, η can be computed with Lipschitz constant L =
2(cid:107)YtT(cid:48)Yt(cid:48)YtTbYt(cid:107)F for ∇J(Btt(cid:48)), see Appendix 7.4. b iadncdrreesmsetnhtiaslisusuped,atwee pthroapnosreecaoumtpiulittiynfgunfcrtoiomnstchraattecxha?mT-o
4.4 ComparisonandAnalysis ines the reward and cost of both update operations. With
the utility function, our framework is able to automatically
We first show that the solution Y returned by the pro-
determine the “best” strategy based on different levels of
posed multiplicative rule is identical to that by the tradi-
changes.
tional multiplicative rule proved in the following lemma.
5.1 IncrementalUpdateAlgorithm
Lemma 5 UpdatinglabelassignmentY vertexbyvertexus-
ing Lemma 1 is identical to the following traditional multi- Our incremental update algorithm supports both graph
plicative rule [42]: changes and label changes. The first scenario includes ver-
tex/edge insertion and deletion; while the label changes in-
Y =Y ◦(cid:115) (cid:80)t(cid:48)(cid:54)=tGtt(cid:48)YtBtTt(cid:48) +βStY0 clude receiving additional labels, or correction of noise la-
t t (cid:80)t(cid:48)(cid:54)=tYtBtt(cid:48)YtT(cid:48)Yt(cid:48)BtTt(cid:48) +βStYt bgeetlsa.dWdiittihontahleexpprolilcifietrlaatbioenlsofrfocmrouwsdesros;uorcrinwge,iwdehnetnifeyvenrowisee
where S ∈ Rn×n is the label indicator matrix, of which labels based on user feedback, we can update label assign-
S =1 if u∈VL and zero for all the other entries, and S ment for each vertex.
uu t
is the sub matrix of S for t-type vertices. A simple approach to deal with graph/label updates is
tore-runourlabelinferencealgorithmfortheentiregraph.
Proof: The detailed proof is shown in Appendix 7.5. However, this can be computationally expensive. We thus
We then analyze the time complexity of computing each propose the incremental algorithm, where we perform par-
basic operator in a single iteration. As outlined in Table 3, tial updates instead of full updates for each vertex. Our
eachbasicoperatorcanbecomputedefficientlyinrealsparse incremental algorithm is built upon the intuition that even
networks. In addition, because of our pre-computation of with the graph and label changes, the majority of vertices
A , multiplicative and additive update rules have the same tendtokeepthelabelassignmentsorwillnotchangemuch.
t
near-linear time computational cost in a single iteration, Therefore,weperforma”lazy”adjustment,i.e.,utilizingthe
which is much smaller than many traditional multiplicative old label assignment and updating a small candidate set of
rulesforY. Forexample,ifweapplythemultiplicativerule change vertices. Unfortunately, it is very challenging to de-
proposedbyZhuet. al.[42],itleadstoO(n n k+n n k+ signanefficientandeffectiveincrementallabelpropagation
a b a c
n n k) computation complexity per iteration. algorithm because in label propagation the adjustment of
b c
Convergence. Letusfirstexaminethegeneralconvergence existing vertices will affect its neighbors, and even vertices
of our label inference framework. Based on Corollary 1, that are far away. This requires us to propose an effective
6
Algorithm 2 Incremental label Inference algorithm with which receive updated ground truth labels. Next, we iter-
changes atively perform a conditioned label assignment update for
Input: Graph G, old label matrix Y, a few ground truths Y∗, each vertex in the candidate set (Lines 6–7), as well as an
confidence level θ∈[0,1), and changes ∆V update for candidate vertices set cand (Lines 8–10). When
Output: New label matrix Y updating the candidate vertices, we include one neighbor
n
01: cand=∆V of an existing candidate vertex into cand only if it satisfies
02: for each u∈G theconditions(Line9)thatarebasedonthepre-computed
03: Y (u)=Y(u) value of w and δ (Lines 4–5). The w exactly denotes the
n
04: wtt(cid:48) =avg(u,v)∈Ett(cid:48)Y(u)TBtt(cid:48)Y(v), astvaenradgaerdd edffeevciattioofnanofyegffivecetnsionfdiavnidyugailveanndinδdidveidnuoatel,satnhde
σtt(cid:48) =std(u,v)∈Ett(cid:48)Y(u)TBtt(cid:48)Y(v) θ is a confidence level parameter for the trade-off between
05: repeat
efficiency and accuracy. We here make an assumption that
06: for each vertex u∈cand
the propagation matrix B is inherent property and can be
07: update Y (u) (see Section 4.2)
n accuratelyestimatedbythesampledolddata(i.e.,B isnot
08: for each v∈N(u),v(cid:54)∈cand
(cid:113) changing with new data). The details of candidate vertex
09: if |YnT(u)Bt(u)t(v)Yn(v)−wt(u)t(v)|≥ 1−1θσt(u)t(v) update are presented as follows.
10: cand=cand∪{v}
Update of candidate vertices set. Thesetofcandidate
11: until Y converges
n
vertices(cand)denotesasubsetofvertices, wherethelabel
12: return Y
n
assignmentrequiresupdateduetograph/labelchanges. Ba-
sically,foreachcurrentvertexuincand,weupdateitslabel
assignment and examine its propagation behavior over its
strategy to choose which subset of vertices to be adjusted
neighbor vertex v. Intuitively, the modification of one ver-
to guarantee the performance gain of the incremental algo-
texcancausetherelationtoitsneighborsadjusted,whereas
rithm. Let us use the following example to illustrate more
the general behavior between the corresponding two types
about the challenge.
of vertices should not change much. More specifically, let
the wtt(cid:48) =avg(u,v)∈Ett(cid:48)Y(u)TBtt(cid:48)Y(v) denote the averaged
VVa 3 1 4 5 2 6 190 en 13 7 4 9 10 tdehffeeveciσattttio(cid:48)ofn=anosyftdeiffn(ude,cvivt)∈sidEbuttea(cid:48)tlYwb(eeuet)nwTeBte−tnt(cid:48)aYtn−(dva)tn(cid:48)d−dentt(cid:48)oy−tpeettyhipneed,sitvaaindndudaallresdt.
b e 5
gr If the estimated effect between u and v after adjustment,
V 7 8 2 6 8 significantly differs from the averaged effect (i.e., w )
c t(u)t(v)
red within the same types, we add vertex v into cand. The sig-
nificance is evaluated using a threshold that consists of the
Figure 4: Two new vertices are inserted into the
Tripartite graph in Fig. 3. confidence parameter θ and σtt(cid:48).
Example 1 Fig.4showsanexampleofinitializingthecan- Va 1 2 3 7
didate set of vertices to be adjusted when graph receive up-
dedagteess.inTtwoothveerttricipeasr9titaendgra1p0harsehoiwnsneritnedFwigi.th3n(eaw),fowrhmicedh Vb 3 4 5 6 reen 1 4 5
g
consequently leads to vertices 5, 6, 8 receive new links from V 7 8 2 6 8
c
vertices9and10. Therefore,thesubsetofvertices{5,6,8,9,10}
red
(i.e., the subgraph bounded by the red box) receive graph
changesandtheirlabelembeddingrequiretobeupdated. How- Figure 5: Vertex 7 of the Tripartite graph in Fig. 3
ever, updating the position of vertex 6 in the latent space obtain a new label
mightcausethechangeofthatofvertex2, orevenallofthe
remaining vertices. It is unclear to what extent we should
prorogate those changes: Too aggressive leads to an entire Example 2 Consider again the example shown in Fig. 4,
update (equivalent to the recomputing) while too conserva- the graph changes activate the changes of embedded posi-
tive leads to great loss in accuracy. tions of vertices {5, 6, 8, 9, 10} (i.e., vertices in the box).
The movement of changed vertices in the label embedding
space, consequently causes their neighbor vertices to leave
Overview of the incremental algorithm. We develop
old embedded positions. For example, if vertex 6 is moved
an incremental algorithm based on the mean field theory,
furthertotheredaxis,itsneighborvertex2mightberequired
whichassumesthattheeffectofalltheotherindividualson
to move away from the green axis too. Similarly, in Fig. 5
anygivenindividualisapproximatedbyasingleaveragedef-
whenvertex7receivesanewlabel,vertex7arepushedcloser
fect. Hencewechoosethecandidatesetofverticesbasedon
to the green axis, which further influences the movement of
how much they differ from the averaged effect. The overall
both vertex 1 and vertex 4. Meanwhile, the new label also
processisoutlinedinAlgorithm2. Wefirstidentifyasmall
strengthens the green label propagation to vertex 1.
portionofchanges∆V asourintialcandidatesetofvertices
cand(Line3),where∆V denotethesetofchangedvertices Confidence level parameter θ for speed up. We now
(e.g.,{5,6,8,9,10}inFig.4),includingnewanddeletedver- discusstheeffectofparameterθthatcontrolsthepercentage
tices, vertices that have new or deleted edges, and vertices of neighbors that avoid label update in each iteration (Line
7
9). A larger value of θ indicates that more neighbors are re-computing approach f , u is the reward unit for speed
R s
filtered out for update, thus leading to a higher confidence up and u is the reward unit for accuracy.
a
that the incremental algorithm is more efficient than a re- We next examine where the information loss of an in-
computing approach. One nice property of Algorithm 2 is cremental algorithm comes. Basically, the accuracy loss of
thatitcanboundthenumberofcandidateverticesrequiring the incremental algorithm comes from two parts: the loss
labelassignmentupdateineachiteration. Inparticular,we onV− duetofixingtheirlabelassignments,andthelosson
present the following theorem that provides the efficiency V+duetotheapproximatinglabelassignmentsforV+with
guarantee for our incremental algorithm. fixed previous assigned labels for V−. Therefore, we have:
Theorem 1 Ineachiteration,theprobabilityPcthataneigh- Loss(fI,fR)=D(Y|fR,Y|fI,V+)+D(Y|fR,Yo,V−) (13)
bor of any candidate vertex requires label assignment update
where Y|f denotes the label assignments using label in-
is lower bounded by 1−θ. That is : P ≤1−θ.
c ference operator f, Y denotes the previous assigned labels,
o
Proof (sketch): Let X denote a random variable, which andD(Y1,Y2,V)denotesthelabelassignmentdifferencesof
represents the value of Y(u)TB Y(v) for any pair of vertices set V between two assignments Y1 and Y2.
t(u)t(v)
linkedvertices(u,v)betweent-typeverticesandt(cid:48)-typever- Unfortunately, it is non-trivial to examine differences be-
tices. Then wtt(cid:48) is the average value of X, and σt2t(cid:48) is the tweenlabelassignmentsbyare-computingandthosebyan
variance of X. incremental algorithm. This is a chicken and egg paradox:
BasedonChebyshev’sinequality,ifXisarandomvariable one wants to decide which algorithm to use based on infor-
withfiniteexpectedvalueuandfinitenon-zerovarianceσ2. mation loss (label assignments differences); while without
Then for any real number q>0, we have: applying both algorithms, one can not get an accurate un-
derstanding about the differences. In order to proceed, we
1
Pr(|X−u|≥qσ)≤ (10) estimate the information loss by examining the similarity
q2
between old data and new data, which is inspired by the
(cid:112)
Let us replace u by wtt(cid:48), σ by σtt(cid:48), q by 1/(1−θ), we conceptdriftmodeling[12,45]forsupervisedlearning. The
have: intuitionisthatifthedistributionofnewdatasignificantly
Pr(|X−wtt(cid:48)|≥(cid:112)1/(1−θ)σtt(cid:48))≤(1−θ) (11) variesfromthatofolddata,anincrementalupdateresultsin
higherinformationloss;whilethenewdataareverysimilar
Eq.(11)exactlyexaminesthemaximumboundoftheprob-
toolddata,anincrementalupdateleadstomuchlessinfor-
ability of reaching Line 9 in Algorithm 2. Since the proba-
mationloss. Specifically,weusethegraphproximityheuris-
bility of reaching Line 9 in Algorithm 2, is identical to the
ticdefinedinEq.(3)tosimulatethesimilaritybetweennew
probabilitythataneighborofanycandidatevertexrequires
data and old data, and consequently the information loss,
label assignment update, we complete the proof. (cid:4)
which leads to the following equations:
In conclusion, the parameter θ roughly provides a utility
that controls the estimated speed up of an incremental al- D(Y|fR,Y|fI,V+)=| avg sim(u,v,G∪∆G)
gorithmtowardare-computingapproach. Alargervalueof u∈V−,v∈V+
(14)
θ indicates that more neighbors are filtered out for update, − avg sim(u,v,∆G)|
thus leading to a higher confidence that the incremental al- u∈V−,v∈V+
gorithmismoreefficientthanare-computingapproach. On
another hand, a larger value of θ leads to fewer updates of D(Y|fR,Yo,V−)=| avg sim(u,v,G∪∆G)
u∈VL,v∈V−
neighbors and subsequently lower accuracy. From this per- (15)
spective, the parameter θ allows us to choose a good trade- − avg sim(u,v,G)|
u∈VL,v∈V−
off between computational cost and solution accuracy.
Thoughinformationlossmighthappenwhenusinganin-
5.2 ToRe-computeorNot?
crementalalgorithm,muchlesstimeisconsumedincompu-
In the online label inference, we continually receive new tation compared to any re-computing operation, especially
dataand/ornewevidence(labels). Conditioningonthenew when the percentage of changes is very small.
dataandnewevidence,wehavetwochoices: wecanrecom-
putethelabelassignmentforallthevertices,usingfulllabel 6. EXPERIMENTS
inference; or, we can fix some of the previous results, and
only update a certain subset of the vertices using an incre- 6.1 DatasetsandSettings
mentalalgorithm. Tounderstandtheconsequencesofusing
Weevaluatetheproposedapproachesonfourrealdatasets
anincrementalalgorithm,wemustanswerabasicquestion:
withthreedifferentclassificationtasks: sentimentclassifica-
how much accuracy loss are incurred and how much speed
tion, topic classification and rating classification. Among
upareachievedbyanincrementalalgorithmcomparedtoa
the four datasets, Prop 30 and Prop 37 are two tripartite
re-computing approach?
graphscreatedfrom2012NovemberCaliforniaBallotTwit-
We thus define the utility function for the incremental
ter Data [42], each of which consists of tweet vertices, user
algorithm f as follows:
I vertices,andwordvertices. Thelabelclassesaresentiments:
U(fI)=usGain(fI,fR)−uaLoss(fI,fR) (12) positive,negative,andneutral. ThePubMeddataset[24]is
represented as a tripartite graph with three types of ver-
where Gain(fI,fR) = (2|G−∪θ)∆|∆GG|| is the the computational tices: papers, reviewers, and words. Each paper/reviewer
gain achieved by the incremental algorithm f compared is associated with multiple labels, which denote the set of
I
to the re-computing algorithm f , ∆G is the subgraph in- subtopics. The MovieLen dataset [31] represents the folk-
R
duced by the changed vertices ∆V, Loss(f ,f ) is the in- sonomy information among users, movies, and tags. Each
I R
formation loss of using incremental algorithm f instead of movie vertex is associated with a single class label. Here
I
8
we use three coarse-grain rating classes, “good”, “neutral”, MRG ARG GRF MHV BHP
and “bad” as the ground truth labels. A short description 0.8
ofLeaetchMdRatGa/seAtRisGsudmenmoateriztehdeinmuTlatbiplelic4a.tive/additive rule Accuracy 00..46
0.2
update with graph heuristic initialization. We compare our
Prop 30 Prop 37 MovieLen PubMed
approaches with three baselines: GRF [44], MHV [9] and
MRG ARG GRF MHV BHP
BHP [13]. GRF is the most representative traditional label
0.8
propagation algorithm (i.e., no B matrices or B matrices R 0.6
E
are identity matrices), MHV is selected as a representative B 0.4
method that supports vertex-level heterogeneity (B matri- 0.2
ces are diagonal), and BHP denotes the label propagation Prop 30 Prop 37 MovieLen PubMed
algorithmthatallowspropagation-levelheterophilyanduti-
lizesasinglematrixB. Foralloftheseapproaches,webegin Figure 6: Classification quality comparisons. The
with the same initialized state, and we use the same regu- higher accuracy (the lower balanced error rate), the
larizations/or no regularizations for all approaches. Note better quality.
that our goal in this paper is not to justify the perfor-
manceofsemi-supervisedlearningfordifferentclassification
tsauspkesrv(ivsaerdioluesarsnuirnvgeywsithhavfeewjuerstliafibeedletdhedaatdav)a,nbtuagteraotfhseermtio- 00000.....12345 x246811 0210−4 0000....011255 0000....00002468 0000....2468 0000....011255
propose a better semi-supervised label propagation algo- (a) Prop 30, MRG (b) Prop 37, ARG
rithm for tripartite graphs. Therefore, we do not compare
opuoWrrtaevpeepvcratoolaurcahmteeastchwheiinteheff.oecthtievrensuespserovfiesaedchmaeptphroodaschsuinchtearsmssupo-f 000...123 000...123 000...123 00000....00001234 000011.....24682 0011..55
(c) MovieLen, ARG (d) PubMed, MRG
classificationaccuracy. Specifically,weselect[1%,5%,10%]
ofverticeswithgroundtruthlabelsasthesetofseedlabeled
vertices,andthenrundifferentlabelpropagationalgorithms Figure 7: The automatically learned propagation
overtheentirenetworktolabeltheremainingvertices. Note matrices B by the better algorithm between ARG
thattheselectionofseedlabeledverticesisnotthefocusof and MRG (Best viewed in color).
this work, and thus we simply adopt the degree centrality
to select the top [1%, 5%, 10%] of vertices as seed nodes.
6.2 StaticApproachEvaluation
For both single- and multi-class labels, we assign label l to
i
a vertex u if Y(u,l ) > 1/k and validate the results with Inthissection,weevaluatetheperformanceofAlgorithm1
i
ground truth labels. We represent classification results as a in terms of convergence, efficiency and classification accu-
contingency matrix A, with A for i,j ∈ L = {l ,··· ,l } racy.
ij 1 k
wherekisthenumberofclasslabels,andA isthenumber
ij
oftimesthatavertexoftruelabell isclassifiedaslabell . Question 1 Accuracy: HowdoesAlgorithm1performcom-
i j
With the contingency matrix A, the classification accuracy pared to the baselines?
is defined as: Accuracy= (cid:80)(cid:80)iijAAiiij.
Theclassificationaccuracycannotwellevaluatetheper- Result 1 Algorithm 1 outperforms the baselines in terms of
formance if label distribution is skewed. Therefore, we also classificationaccuracyandbalancederrorratewhendataexhibit
use Balanced Error Rate [30] to evaluate the classification heterophilyand/orwithmulti-classlabels.
quality. The balanced error rate is defined as BER = 1− Fig.6reportstheclassificationaccuracyandbalanceder-
k1(cid:80)i (cid:80)AjiAiij. rcoornvraertgeecnocmeptaorleisroannscefovraallulethteoa1p0p−r6o.acThehse. Hpaerraemweetesretβthies
set to 5. This is because in our preliminary experiments,
Table 4: The statistics of datasets. “single label” we notice that the accuracy increases when β is increased
denotes whether each vertex can be associate with from1to5,butafterthat,increasingβ doesnotleadtosig-
one or more labels; “max label” denotes the class nificant changes in accuracy. Clearly, our approaches MRG
label l that has maximum number of vertices and andARG,performsmuchbetterthantheotherapproaches
m
“% max label” denotes the percentage of vertices onProp30,MovieLen,PubMed,andperformsimilarlywith
that have ground truth label l . GRF on Prop 37. The results validate the advantage of in-
m
Data Prop30 Prop37 MovieLen PubMed ferred B matrix in terms of supporting vertex-level hetero-
#nodes 23,025 62,383 26,850 36,813 geneity, propagation-level heterogeneity and the multi-class
#edges 168,131 542,028 168,371 263,085 label propagation.
#classes 2 2 3 19
For example, Prop 37 is about labeling genetically engi-
singlelabel Yes Yes Yes No
neeredfoods, andthemajorityofpeoplehavepositiveatti-
%na 3.6% 3.1% 14.9% 0.3% tude. AsshowninFig.7(b),alloftheBmatriceslearnedby
%n 45.7% 54.5% 56.8% 0.6%
b
%nc 50.7% 42.4% 28.3% 99.1% ARG are diagonal matrices, which illustrates that the link
%maxlabel 67% 90% 67% 23% formation exactly follows homophily assumption. Hence,
on Prop 37, forcing B matrices as identity matrices such
All the algorithms were implemented in Java 8 in a PC as GRF obtains comparable quality with our approaches.
with i7-3.0HZ CPU and 8G memory. Moreover,althoughbothPubMed(seeFig.7(d))andProp
9
MRG ARG GRF MHV BHP ms) 10000 MRG ARG GRF MHV BHP
√√alues (J/J)r0 0000 0....9999 .192468 √√alues (J/J)r0 00000.....99999 156789 Time per iteration ( 1000
e v 0.88 e v 0.94 Prop 30 Prop 37 MovieLen PubMed
Objectiv 000...888246 Objectiv 000...999123 me (s) MRG ARG GRF MHV BHP
0 (a N2)0umPbre 4or0 opf ite3 6ra00tion s8 0r 100 0(b N2)0umPbre 4or0 opf ite 36ra07tion s8 0r 100 Total running ti 1 1000
Prop 30 Prop 37 MovieLen PubMed
√√es (J/J)r0 000 ...1789 √√es (J/J)r0 0 .19 Fitiegruarteio9n: iEsffiinciemncilylisceocmonpdarisscoanle. Nanodtettohtaatltrimunenpinegr
e valu 00..56 e valu 0.8 time is in second scale.
ectiv 00..34 ectiv
Obj 0.2 0 20 40 60 80 100 Obj 0.7 0 20 40 60 80 100 running time in Fig. 9. Because on average they converge
Number of iterations r Number of iterations r 2-3 times faster than all the baselines, computationally ex-
(c) MovieLen (d) PubMed pensiveperiterationduetotheincurredadditionalcostfor
computing the propagation matrices B, the proposed algo-
rithms are still very efficient for large-scale data.
Figure 8: Convergence comparison. y axis: the ra-
√ √
tio between J and J , where J is the objective
r 0 r
value for rth iteration and J0 is the objective value Table 5: The effect of graph regularization.
of initialization. Prop30 Prop37
w w/o w w/o
MRG 0.869 0.848 0.935 0.908
37 exactly follow homophily assumption, the proposed ap-
ARG 0.867 0.781 0.942 0.928
proachesperformbetterthanotherapproachesonPubMed.
This is because PubMed has multi-class labels and our ap-
proaches well support multi-class label propagation com-
Question 4 Regularization: Whatistheeffectofthegraph
pared to other approaches. On Prop 30 and MovieLen, the
regularization?
B matricesareamixtureofdifferentformsofmatrices. Un-
der this situation, our approaches MRG and ARG perform
betterthanalltheapproachesincludingBHP(usingasingle Result 4 We validate that graph regularization term is
arbitrary B matrix). helpful for sentiment classification tasks on Prop 30 and
Prop 37.
Question 2 Convergence: Do multiplicative and additive
Intuitively user-to-user graph (e.g., friendship graph) or
rules in Algorithm 1 converge?
document graph (e.g., citation graph) will be very helpful
for labeling users or documents. Unfortunately, we do not
Result 2 Bothmultiplicativeandadditiverulesareguar- havesuchgraphsforMovieLenandPubMed. Therefore,al-
anteed to converge into local optima and their convergence though our framework is very general and supports various
rate in terms of objective values are much faster than the regularization, we only compare the classification accuracy
baselines. w/o graph regularization on Prop 30 and prop 37 (i.e., the
Instead of fixing convergence tolerance value, now we fix two data sets that have additional user to user re-tweeting
themaximumiterationnumberto100,andvalidatethecon- graphs). With the additional regularization, the classifica-
vergence performance of the proposed algorithms. Fig. 8 tionaccuracyincreasesby6.1%onProp30and2%onProp
shows that the objective values are non-increasing using 37.
both multiplicative rules and additive rules. In addition,
Question 5 MRG V.S. ARG: Is MRG perform better
the proposed algorithms decrease the objective value much
than ARG, or vice verse?
faster than other algorithms.
Question 3 Efficiency: Are the proposed algorithms scal-
able? Result 5 The two update rules exhibit similar behaviors
in terms of accuracy, convergence and running time.
Interestingly,weobservethatthereisnoclearwinnerbe-
Result 3 Ineachiteration,theproposedalgorithmsMRG
tween the two update rules. The result demonstrates that
and ARG require more computational cost than the base-
ourunifiedalgorithmcanserveasaframeworkforcompar-
lines. However, since they converge much faster than other
ison between different update rules.
approaches, the total running time of the proposed algo-
rithms, are still faster than the baselines. 6.3 IncrementalApproachEvaluation
Weagainfixtheconvergencetolerancevalueto10−6,and We justify the advantage of our incremental approaches
report the average running time per iteration and the total in terms of guaranteed speed up and decent classification
10