ebook img

Joint Inference of Multiple Label Types in Large Networks PDF

0.26 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Joint Inference of Multiple Label Types in Large Networks

Joint Inference of Multiple Label Types in Large Networks DeepayanChakrabarti [email protected] FacebookInc. StanislavFuniak [email protected] FacebookInc. 4 1 JonathanChang [email protected] 0 FacebookInc. 2 SofusA.Macskassy [email protected] n a FacebookInc. J 0 3 Abstract hometown=H current city=C’ ] We tackle the problem of inferring node labels G in a partially labeled graph where each node in L thegraphhasmultiplelabeltypesandeachlabel . s type has a largenumberof possible labels. Our c primaryexample, andthe focusofthis paper, is [ the joint inference of label types such as home- Node 1 town,currentcity,andemployers,foruserscon- u v nectedbyasocialnetwork. Standardlabelprop- current city=C 9 agationfailstoconsiderthepropertiesofthela- 0 7 beltypesandtheinteractionsbetweenthem.Our 7 proposedmethod,calledEDGEEXPLAIN,explic- hometown=H . itly models these, while still enabling scalable 1 0 inference under a distributed message-passing Figure1.Anexamplegraphofuandherfriends: Thehometown 4 architecture. On a billion-node subset of the friendsofucoincidentallycontainasubsetwithcurrentcityC′. 1 Facebooksocialnetwork,EDGEEXPLAINsignif- This swamps the group from u’s actual current city C, causing v: icantlyoutperformslabelpropagationforseveral labelpropagationtoinferC′foru.However,ourproposedmodel i labeltypes,withliftsofupto120%forrecall@1 (calledEDGEEXPLAIN)correctlyexplainsallfriendshipsbyset- X and60%forrecall@3. tingthehometowntobeHandcurrentcitytobeC. r a 1. Introduction fieldssuchasthehometowns,currentcities,andemployers Inferringlabels of nodesin networksis a commonclassi- of users of a social network, where users often only par- ficationproblemacrossawidevarietyofdomainsranging tially fill in their profile, if at all. Here, we have multiple fromsocialnetworksto bibliographicnetworksto biolog- typesofmissinglabels,whereeachlabeltypecanbevery ical networks and more. The typical goal is to predict a high-dimensional and correlated. Joint inference of such singlelabeloflowdimensionalityforeachnodeinthenet- labeltypesisimportantformanyrankingandrelevanceap- work(say,whetherawebpageina.edudomainbelongs plicationssuchasfriendrecommendation,adsandcontent toaprofessor,student,orthedepartment)givenapartially targeting, and user-initiated searches for friends, motivat- labelednetworkandpossiblyattributesofthenodes.Inthis ingourfocusonthisproblem. paperweinsteadconsidertheproblemofinferringmultiple One standard method of label inference is label propaga- tion (Zhu&Ghahramani, 2002; Zhuetal., 2003), which Thisisatechnicalreportinpreparationforsubmissiontoapeer- tries to set the label probabilities of nodes so that friends reviewedconference. have similar probabilities. While this method succinctly JointInferenceofMultipleLabelTypesinLargeNetworks captures the essence of homophily (the more two nodes Ourcontributionsare: have in common, the more likely they are to con- nect(McPhersonetal.,2001)),itoptimizesforonlyasin- 1. We formulate the label inference problem as one of gletypeoflabelandassumesonlyasinglecategoryofre- explaining the graph structure using the labels. We lationships. Itthereforefailstoaddressthepotentialcom- explicitly accountfor the fact that labels belong to a plexity of edge formation in networks, where nodes have limited set of label types, whose properties we enu- differentreasonstolinktoeachother.Asanexample,con- merateandincorporateintoourmodel. siderthesnapshotofasocialnetworkinFigure1,wherewe 2. Our gradient-based iterative method for inferring la- want to predict the hometownand currentcity of node u, bels is easily implemented in large-scale message- givenwhatweknowaboutuandu’sneighbors. Here,the passingarchitectures. Weempiricallydemonstrateits labelsofnodeuarecompletelyunknown,butherfriends’ scalability on a billion-node subset of the Facebook labels are completely known. Label propagation would social network, using publicly available user profiles treat each label independentlyand infer the hometown of andfriendships. u to be the most common hometown among her friends, thecurrentcitytobethemostcommoncurrentcityamong 3. Onthislargereal-worlddataset, EDGEEXPLAIN sig- friends, and so on. Hence, if the bulk of friends of u are nificantly outperforms label propagation for several fromherhometownH,theninferencesforcurrentcitywill label types, with lifts of up to 120% for recall@1 bedominatedbythemostcommoncurrentcityamongher and 60% for recall@3. These improvements in ac- hometownfriends(say,C′)andnotfriendsfromheractual curacy, combined with the scalability of EDGEEX- currentcity C; indeed, the same will happenfor all other PLAIN,clearlydemonstrateitsusefulnessforlabelin- labeltypesaswell. ferenceonlargenetworks. Ourproposedmethod,namedEDGEEXPLAIN,approaches Thepaperisorganizedasfollows. Wesurveyrelatedwork theproblemfromadifferentviewpoint,usingthefollowing inSection2. OurproposedmodelisdiscussedinSection3, intuition:Twonodesformanedgeforareasonthatislikely followedbytheinferencemethodinSection4,andgener- toberelatedtothemsharingthevalueofoneormorelabel alizations of the model in Section 5. Empirical evidence types(e.g.,twouserswenttothesamecollege).Usingthis provingtheeffectivenessofourmethodispresentedinSec- intuition, we can go beyondstandardlabelpropagationin tion6,followedbyconclusionsinSection7. thefollowingway:insteadoftakingthegraphasgiven,and modelinglabelsasitemsthatpropagateoverthisgraph,we considerthelabelsasfactorsthatcanexplaintheobserved 2. RelatedWork graph structure. For example, the inferences for u made We discusspriorworkin semi-supervisedlearning,statis- by label propagation leave u’s edges from C completely ticalrelationallearning,andinlatentmodelsfornetworks. unexplained.Ourproposedmethodrectifiesthis,bytrying to infer node labels such that for each edge u ∼ v, we SEMI-SUPERVISED LEARNING. Many graph-based ap- canpointtoareasonwhythisisso—uandv arefriends proaches can be viewed as estimating a function over fromthesamehometown,orcollege,orthelike.Whilewe the nodes of the graph, with the function being close areprimarilyinterestedininferringlabels,wenotethatthe to the observed labels, and smooth (similar) at ad- inferredreasonforeachedgecanbeimportantapplications jacent nodes. Label propagation (Zhuetal., 2003; by itself; e.g., if a new node u joins a network and forms Zhu&Ghahramani, 2002) uses a quadratic function, andedgewithv,knowledgeofthereasoncanhelpwiththe but other penalties are also possible (Zhouetal., 2004; well-knownlinkpredictiontask—shouldwerecommend Belkinetal., 2004; 2005). Other approaches mod- v’s college friends, or friendsfrom the same high school, ify the random walk interpretation of label propaga- etc.? tion (Balujaetal., 2008; Talukdar&Crammer, 2009). In order to handle a large number of distinct label values, We note that a seemingly simple alternative solution — the label assignments can be summarized using count- clusterthegraphandthenpropagatethemostcommonla- min sketches (Talukdar&Cohen, 2014). None of the belswithinacluster—isinfactquiteproblematic. Inad- approaches consider interactions between multiple label ditiontothecomputationcost,anyclusteringbasedsolely types,andhencefailtocapturetheedgeformationprocess onthegraphstructureignoreslabelsalreadyavailablefrom ingraphsconsideredhere. userprofiles,butanyclusteringthattriestousetheselabels mustdealwithincompleteandmissinglabels. Thecluster- STATISTICAL RELATIONAL LEARNING. These algo- ing must also be complexenoughto allow many overlap- rithms typically predict a label based on (a) a local ping clusters. Hence, we believe that clustering does not classifier that uses a node’s attributes alone, (b) a rela- readilylenditselftoasolutionforourproblem. tional classifier that uses the labels at adjacent nodes, JointInferenceofMultipleLabelTypesinLargeNetworks and (c) a collective inference procedure that propagates employer. the information through the network (Chakrabartietal., Oursolutionexploitsthreepropertiesoftheselabeltypes: 1998; Perlich&Provost, 2003; Lu&Getoor, 2003; Macskassy&Provost, 2007). Macskassy et al. (2007) (P1) Theyrepresenttheprimarysituationswheretwopeo- observe that the best algorithms (weighted-vote rela- ple can meet and become friends, for example, be- tional neighbor classifier (Macskassy&Provost, 2007) causetheywenttothesamehighschoolorcollege. with relaxation labeling (Rosenfeld&Hummel, 1976; Hummel&Zucker, 1983)) tend to perform as well as (P2) These situations are (mostly) mutually exclusive. label propagation, which we outperform. While there While there may be occasional friendships sharing, has been some work focusing on understanding how to say,hometownandhigh-school,wemakethesimpli- combineandweighdifferentedgetypesforbestprediction fyingassumptionthatmostedgescanbeexplainedby performance(Macskassy,2007),theedgetypes(analogous onlyonelabeltype. to our reason for an edge) were given up front. We note (P3) Sharing the same label is a necessary but not suffi- thatthesealgorithmstypicallyfocusonasinglelabeltype, cient condition. For example, “We are friends from whileweexplicitlymodeltheinteractionsamongmultiple Chicago”typicallyimpliesthattheindicatedindivid- types. ualswere,atsomepointintime,co-locatedinasmall There is also extensive work on probabilistic re- area within Chicago (say, lived in the same building, lational models, including Relational Bayesian Net- metinthesamecafe),buthardlyimpliesthattworan- works(Koller&Pfeffer,1998;Friedmanetal.,1999),Re- domlychosen individualsfromChicago are likely to lational Dependency Networks (Neville&Jensen, 2007), befriends. and Relational Markov Networks (Taskaretal., 2002). These are very general formalisms, but it is our explicit (P1) is a direct result of our application; our desired la- modeling assumptions regarding multiple label types that beltypesweretargetedatfriendshipformation. Combined yieldsgainsinaccuracy. with (P2), our five label types can be considered a set of mutuallyexclusiveandexhaustive“reasons”forfriendship; LATENT MODELS. Graphstructurehasbeenmodeledus- whilethisisnotstrictlytrueforhighschoolandhometown, ing latent variables (Hoffetal., 2002; Milleretal., 2009; empiricalevidencesuggeststhatitisagoodapproximation Pallaetal.,2012),butwithanemphasisonlinkprediction. (shown later in Section 6) and we defer a discussion on However,ourgoalistomakepredictionsabouteachindi- thispointtoSection5. However,as(P3)shows,wecannot vidualuser,andsuchlatentfeaturescanbearbitrarycom- simply castthe labels asfeatureswhosemere presenceor binationsofuserattributes,ratherthanconcretelabeltypes absence significantly affects the probability of friendship; we wish to predict. Other models simultaneously explain instead,amorecarefulanalysisiscalledfor. the connectionsbetween documentsas well as their word distributions (Nallapatietal., 2008; Chang&Blei, 2010; Formally, we are given a graph, G = (V,E) and a set of Hoetal., 2012). While we do not consider the problem label types T = {t1,...,tk}. For each label type t, let ofmodelingtextdata,ourmodelpermitsustoincorporate L(t) denote the (high-dimensional) set of labels for that node attributes, such as group memberships. Finally, the labeltype.Eachnodeinthegraphisassociatedwithbinary number of distinct label values in our application is very variables Sutℓ, where Sutℓ = 1 if node u ∈ V has label large (on the order of millions), and we suspect that the ℓ for label type t. Let SV and SH represent the sets of latent variables would have to have a large dimension to visibleandhiddenvariables,respectively.Wewanttoinfer explaintheedgesinourgraphwell. thecorrectvaluesofSH,leveragingSV andG. A popular method for label inference is label propaga- 3. ProposedModel tion (Zhu&Ghahramani, 2002; Zhuetal., 2003). For a single label type, this approachrepresentsthe labeling by Inthissectionwefirstbuildintuitionaboutourmodelusing a set of indicatorvariablesSuℓ, whereSuℓ = 1 if nodeu a running example. Suppose we want to infer the labels islabeledasℓand0otherwise. Zhuetal.(2003)relaxthe (e.g.,“PaloAltoHighSchool”and“StanfordUniversity”) labeling to real-valued variables fuℓ over all nodes u and correspondingtoseverallabeltypes(e.g.,highschooland labelsℓthatareclampedtoone(orzero)fornodesknown college)fora largecollectionofusers. Theavailabledata topossessthatlabel(ornot). Theythendefineaquadratic consistof labelspubliclydeclaredby some users, and the energyfunctionthatassignslowerenergystatestoconfig- (public) social network among users, as defined by their urationswheref atadjacentnodesaresimilar: friendship network. While the desired set of label types maydependontheapplication,herewefocusonfivelabel E(f)= 1 wuv (fuℓ−fvℓ)2. (1) types: hometown, high school, college, current city, and 2 uX∼v Xℓ JointInferenceofMultipleLabelTypesinLargeNetworks Here,u∼vmeansthatuandvarelinkedbyanedge,and ratherthanto explainafewfriendshipsreallywell. Eq.3 wuv isanon-negativeweightontheedgeu∼v. Themini- ismaximizedifthesoftmaxfunctionachievesahighvalue mumofEq.1isfoundbysolvingthefixedpointequations foreachedgeu∼v,i.e.,ifeachedgeis“explained”.This 1 isachievedifthesum t∈T r(u,v,t)ismorethanthere- fuℓ = du uX∼vwuvfvℓ, (2) qSuuitrℓeSdvttℓhriess1hofoldr,ewvehnicPohneinlatubrenlℓis—saitnisfiotehderifwtohredsp,rowdhuecnt where du = u∼vwuv. This procedure encourages fuℓ thereexistsanylabelℓthatbothuandvshare.Theparam- of nodes connPected to clamped nodes to be close to the eter α controls the degree of explanationneeded for each clamped value and propagates the labels outwards to the edge; a small α forces the learning algorithm to be very restofthegraph. Multiplelabeltypescanbehandledsim- surethatuandvshareoneormorelabeltypes,whilewith ilarlybyminimizingEq.1independentlyforeachtype. alargeα, asinglematchinglabeltypeisenough. Empiri- calresultsshownlaterinSection6provethatlargeαvalues While thisformulationmakesfulluse of(P1)andhasthe performbetter, suggestingthat even a single matchingla- advantage of simplicity, it completely ignores(P2). Intu- beltypeisenoughtoexplaintheedge. Theparametercin itively, label propagation assumes that friends tend to be Eq. 5 can be thoughtof as the probabilityof matchingon similar in all respects (i.e., all label types), whereas what anunknownlabeltype,distinctfromthefiveweconsider. (P2) suggests is that each friendship tends to have a sin- Higher values of c can be used to model uncertainty that glereason:anedgeu∼vexistsbecauseuandvsharethe theavailablelabeltypesformanexhaustivesetofreasons samehighschoolorcollegeorcurrentcity,etc.Thishighly forfriendships. Allourexperimentsusec = 0,andreflect non-linearfunctionisnoteasilyexpressedasaquadraticor ourbeliefinproperty(P1);theaccuracyofpredictionsun- similarvariant. derthissetting(shownlaterinSection6)suggeststhat(P1) Instead,weproposeadifferentprobabilisticmodel,which indeedholdstrue. we call EDGEEXPLAIN. As described above, let SV and Further intuition can be gained by considering a node u SH represent the sets of visible and hidden variables re- whoselabelsarecompletelyunknown,butwhosefriends’ spectively;thevariableSutℓisknown(visible)ifuseruhas labels are completely known (see Figure 1). As we dis- publiclydeclaredthelabelℓfortypet,andunknown(hid- cussed earlier in Section 1, label propagation would in- den) otherwise. Then, EDGEEXPLAIN is defined as fol- ferthehometownofutobethemostcommonhometown lows: amongherfriends(i.e.,H),thecurrentcitytobethemost 1 P(SV,SH)= softmax(r(u,v,t)) (3) common current city among friends (i.e., C′), and so on. Z uY∼v t∈T However, such an inference leaves u’s friendships from C completely unexplained. Our proposed method recti- r(u,v,t)= SutℓSvtℓ (4) fiesthis;Eq.3willbemaximizedbycorrectlyinferringH ℓ∈XL(t) andC asu’shometownandcurrentcityrespectively,since softmax(r(u,v,t))=σ α r(u,v,t)+c , (5) H isenoughtoexplainallfriendshipswiththehometown t∈T (cid:18) tX∈T (cid:19) friends, and the marginal extra benefit obtained from ex- plaining these same friendshipsa little better by using C′ whereZ isanormalizationconstant. Here,r(u,v,t)indi- as u’s current city is outweighed by the significant bene- cateswhetherasharedlabeltypetisthereasonunderlying fitsobtainedfromexplainingallthefriendshipsfromC by theedgeu∼v(Eq.4).Thesoftmax(r ,...,r )function 1 |T| settingu’scurrentcitytobeC. shouldhavethreeproperties:(a)itshouldbemonotonically non-decreasing in each argument, (b) it should achieve a To summarize, Eq. 4 encapsulates property (P1) by try- value close to its maximum as long as any one of its pa- ing to have matching labels between friends; Eq. 5 mod- rametersis“high”,andalso(c)itshouldbedifferentiable, elsproperty(P2)byenablinganyonelabeltypetoexplain foreaseofanalysis. InEq.5,weusethesigmoidfunction each friendship; and the form of the probability distribu- to implementthis: σ(x) = 1/(1+e−x). This monoton- tion (Eq. 3) uses only existing edges u ∼ v and not all ically increases from 0 to 1, and achieves values greater node pairs, and thus is not affected when, say, two nodes than1−ǫoncexisgreaterthananǫ-dependentthreshold. withChicagoastheircurrentcityarenotfriends,whichre- Inaddition,thesigmoidenablesfinecontrolofthedegree flects the idea that matchinglabeltypes are necessary but of“explanation”requiredforeachedge(discussedbelow) notsufficient(P3). andallowsforeasyextensionstomorecomplexlabeltypes andextrafeatures(Section5),allofwhichmakeitourpre- 4. Inference ferredchoiceforthesoftmax. Inanutshell,ourmodelingassumptioncanbestatedasfol- TheprobabilisticdescriptionofEDGEEXPLAINinEqs.3-5 canberestatedasanoptimizationprobleminthevariables lows:Itisbettertoexplainasmanyfriendshipsaspossible, JointInferenceofMultipleLabelTypesinLargeNetworks Sutℓ ∈ {0,1}. In the spirit of (Zhuetal., 2003), we pro- closestpointin∆: posearelaxationintermsofareal-valuedfunctionf,with futℓ ∈[0,1]representingtheprobabilitythatSutℓ =1,i.e., f(uk) =argminkq(uk)−q′k2. q′∈∆ theprobabilitythatuseruhaslabelℓforlabeltypet. This yieldsthefollowingoptimization: Thiscanbeeasilyachievedinexpectedlineartimeoverthe size of the label set L(t) (Duchietal., 2008). If only t Maximize log softmax(r(u,v,t)) (6) sparse distributionscPanbe storedforeachlabeltype(say, uX∼v (cid:16) t∈T (cid:17) only the top k labels for each type), the optimal k-sparse where r(u,v,t)= futℓfvtℓ (7) projections can be obtained simply by setting to 0 all but ℓ∈XL(t) thetopklabelsforeachlabeltype,andthenprojectingon tothesimplex(Kyrillidisetal.,2013). futℓ =1 ∀t∈T (8) ℓ∈XL(t) This algorithm converges to a fixed point, and the function values converge to the optimal at a 1/k futℓ ≥0 (9) rate(Beck&Teboulle,2009): wheresoftmax(.)isdefinedasinEq.5,andtheequationfor Lkf(0)−f∗k2 L|T| r(.)isanalogoustoEq.4butmeasuresthetotalprobability g∗−g(k) ≤ u u ≤ , thatuandvhavethesamelabelforagivenlabeltypet. 2k k The problem is not convex in f, but is convex in fu = wheref∗urepresentstheoptimalsetofprobabilitydistribu- {futℓ|t ∈ T,ℓ ∈ L(t)} if the distributions fv are held tions, and g∗ is the optimalfunctionvalue. An important fixedforallnodesv 6= u. Hence,we proposeaniterative consequence of the algorithm is that computation of fu algorithmto infer f. Given fv for all v 6= u, finding the only requires information from fv for the neighborsv of optimalf correspondstosolvingthefollowingproblem: u. Thus,itisa“local”algorithmthatcanbeeasilyimple- u mented in distributed message-passingarchitectures, such Maximizeg(f )= log softmax(r(u,v,t)) , asGiraph(Giraph;Ching,2013). u v∈XΓ(u) (cid:16) t∈T (cid:17) 5. Generalizations where the summation is only over the set Γ(u) of the friends of u, and we again restrict fu to be a set of |T| WenowdiscusssomeaspectsofEDGEEXPLAINandsome probabilitydistributions,oneforeachlabeltype. We note generalizationsthatdemonstrateitswideapplicability. thatg(.)isconvexandLipschitzcontinuouswithconstant L=α·|Γ(u)|,where|Γ(u)|isthenumberoffriendsofu. RELATED LABEL TYPES. Property(P2) assumesthat the reasons for friendship formation are mutually exclusive, Thisisaconstrainedmaximizationproblemwithnoclosed butthisneednotbestrictly true. Forexample,somehigh form solution for f . To solve it, we use proximal u schoolfriendscouldbeasubsetofhometownfriends1. Let gradient ascent, which is an iterative method where in us again consider Figure 1, but with currentcity replaced each step, we take a step in the direction of the gradi- byhighschool. Supposethatthesolid-blacknodesrepre- ent, and then project it back to the probability simplex sent actual high schoolfriends, and we are tryingto infer ∆= futℓ |futℓ ≥0, ℓ∈L(t)futℓ =1∀t∈T . Specif- u’shighschool.Ifthesmallclusterontherightdidnotex- n o ically,let∇grepresentPthegradientofg,withcomponents ist, then Eq. 3 would be maximized by picking the most givenby: common high school among u’s friends (i.e., the solid- blacknodes),eveniftheyarealreadyexplainedbyashared ∂g(f ) u = αfvtℓ·σ −α futℓfvtℓ−c . hometown; thus, EDGEEXPLAIN would pick the correct ∂futℓ v∈XΓ(u) (cid:18) tX∈T ℓ∈XL(t) (cid:19) highschool. Ontheotherhand,ifsomefriendshipswould remainunexplainedwithoutasharedhighschool(e.g.,the Letf(k−1) = f(k−1)|t ∈ T,ℓ ∈ L(t) betheestimated small cluster in Figure 1), then it is not obvious whether u utℓ probabilitydist(cid:8)ributionsforeachoftheT(cid:9)labeltypesatthe weshouldpreferahighschoolthatexplainstheseedgesor endofiterationk−1,andletq(k) representthe (possibly ahighschoolthatrepresentsalargesegmentofhometown utℓ friends. The parameterα modulatesthis trade-off, with a improper)endingpointofthek-thgradientstep: highervalueofαemphasizingtheexplanationofalledges q(uk) =fu(k−1)+ck∇g, 1Therelationshipbetweenhighschoolandhometownisinfact morecomplicated. Thehighschoolcouldbewithindrivingdis- where ck is a step-size parameter that we could set to a tanceofthehometown,butnotinit;andsometimeseventhisdoes constantck = 1/L. Thepointq(uk) isnowprojectedtothe nothold. JointInferenceofMultipleLabelTypesinLargeNetworks as against the explanation of several edges a little better. domly split into five parts and experimental accuracy is Thechoiceofα mustdependonthecharacteristicsof the measuredvia5-foldcross-validation,withtheknownpro- socialnetwork;fortheFacebooknetwork,thebestempir- file information from four folds being used to predict la- ical results are achieved for large α (shown later in Sec- bels for all types for users in the fifth fold. Results over tion6),suggestingthatmanyofourlabeltypesareindeed thevariousfoldsareidenticaltothreedecimalplaces. All mutuallyexclusive. differences are therefore significant and we do not show variancesastheyaretoosmalltobenoticeable. INCORPORATING USER FEATURES. EDGEEXPLAIN eas- In each experiment, we run inference on the training set ily generalizes to broader settings with multiple user fea- and compute a ranking of labels for each label type for tures, such as groupmemberships, topicsof interest, key- each user. This ranking is provided by f computed for words,orpageslikedbytheuser. Asanexample,consider groupmembershipsofusers. Intuitively,ifmostmembers label propagation (Eq. 1) and EDGEEXPLAIN (Eqs. 6-9) respectively.Wethenmeasurerecallatthetop-1andtop-3 of a groupcome fromthe same college, then itis likely a positions,i.e.,wemeasurethefractionof(user,labeltype) college-friendsgroup,andthiscanaidinferenceforgroup pairs in the test set where the predicted top-ranked label members whose college is unknown. This can be easily (oranyofthetop-3labels)matchtheactualuser-provided handledbycreatingaspecialnodeforeachgroup,andcre- label. For reasons of confidentiality, we only present the ating “friendship” edges between the group node and its members. EDGEEXPLAIN will infer labels for the group liftinrecallvaluesofEDGEEXPLAINascomparedtolabel propagation. nodeaswell,andwillexplainits“friendships”viathecol- legelabel.This,inturn,willinfluencecollegeinferencefor groupmemberswith unknowncollegelabels. The impor- IMPLEMENTATION DETAILS. We implemented EDGE- tanceofsuchgroupmembershipfeaturescanalsobetuned, EXPLAIN in Giraph (Giraph; Ching, 2013) which is an asdescribednext. iterative graph processing system based on the Bulk Synchronous Processing model (Malewiczetal., 2010; INCORPORATING EDGE FEATURES. Thereareseveralsit- Valiant,1990). Theentiresetofnodesissplitamong200 uationswhereedge-specificfeaturescouldbeuseful.First, machines, and in each iteration, every node u sends the we may want to give more importance to certain kinds probabilitydistributionsf toallfriendsofu. Tolimitthe u ofedges, such asthegroup-membershipedgesmentioned communication overhead, we implemented two features: above. Second, some featurescould be importantfor one (a) for each user u and label type t, the multinomial dis- labeltypebutnotanother:e.g.,theagedifferencebetween tribution fut. was clipped to retain only the top 8 entries friends could be useful for inferring high school but not optimally (Kyrillidisetal., 2013), and (b) the friendship employer. All these situations can be easily handled by graphis sparsified so as to retain, foreach useru, the top modifyingEq.4toincludeanedge-specificandlabeltype- K friendswhoseagesareclosesttothatofu. Thischoice specific weight. The corresponding modifications to the offriendsis guidedbytheintuitionthatfriendsofsimilar inferencemethodaretrivial. agearemostlikelytosharecertainlabeltypessuchashigh schoolandcollege. We findthatclippingthedistributions 6. Experiments makeslittle differenceto accuracywhile significantly im- proving running time. However, the value of K matters Previously,weprovidedintuitionandexamplessupporting significantly,andwedetailtheseeffectsnext. theclaimthat EDGEEXPLAIN isbettersuitedtoinference ofourdesiredlabeltypesthanvanillalabelpropagation.In RECALL OF EDGEEXPLAIN. Figure 2 shows recall as a thissection,wedemonstratethisviaempiricalevidenceon function of varying number of friends K, against a base- abillion-nodegraph. line of EDGEEXPLAIN with K = 20. We observe that recall increases up to a certain K and then decreases — DATA. WeranexperimentsonalargesubgraphoftheFace- K = 100 for recall at 1, and K = 200 for recall at 3. book social network, consisting of over 1.1 billion users This demonstrates both the importance and the limits of andtheirfriendshipedges. Fromthepublicprofileofeach scalability:increasingthenumberoffriendsenablesbetter user, we extract the hometown, current city, high school, inferencebutbeyondapoint,morefriendsincreasenoise. college, and employer, wheneverthese are available. The Thus,K = 100friendsappeartobeenoughforinference dimensionality of our five label types range from almost underEDGEEXPLAIN. 900Ktoover6.3M.WedescribebelowinIMPLEMENTA- TION DETAILS ourprocessforgeneratingtheedges. This Figure2alsoshowsanincreasingtrendfromhometownto formsourbasedataset. employerin the degreeof improvementobtainedover the K = 20 baseline. This is because (a) the baseline itself EXPERIMENTALMETHODOLOGY. Thesetofusersisran- isbestforhometownandworstforemployer,butalsobe- JointInferenceofMultipleLabelTypesinLargeNetworks 100% 0 100% 20 2 = = K 80% er K 80% ver xxxxxxxxxx dgeExplain ov 4600%% xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx KKKK====51240000000 dgeExplain o 4600%% xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx KKKK====12450000000 E E of 20% xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx of 20% xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Lift xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Lift xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 0% 0% xxHomexxxxtownxxxxCurrxxxxent citxxxxyHigxxxxh schoxxxxol xxxxCollegxxxxe xxxxEmplxxxoyer xxHomxxxxetownxxxxCurrxxxxent cixxxxtyHigxxxxh schoxxxxol xxxxCollegxxxxe xxxxEmploxxyer (a)Recallat1 (b)Recallat3 Figure2.Recallof EDGEEXPLAINforgraphsbuiltwithdifferentnumberoffriendsK: Theplotshowsliftinrecallwithrespecttoa fixedbaselineofEDGEEXPLAINwithK = 20. IncreasingK increasesrecalluptoapoint,butthentheextrafriendsintroducenoise whichhurtsaccuracy. cause(b)Facebookusersappeartohavemanymorefriends Table1.Liftinrecall from using group memberships: Inclusion from label types other than from their current employer. of group membership barely improves recall@3, even though it Theeffectofthislatterobservationisthatifweonlyhave isanorthogonal featurewithwidecoverage. Thus, information a small K, it is very likely that the few friends from the aboutlabeltypesisalreadyencodedinthenetworkstructure,and samecurrentemployerarenotincludedinthatlimitedset carefulmodelingviaEDGEEXPLAINissufficienttoextractit. offriends(whichweempiricallyverified).AsK increases and such same-employer edges become available, EDGE- LABELTYPE RECALLAT1 RECALLAT3 EXPLAINcaneasilylearnthereasonfortheseedges(hence HOMETOWN −0.1% 0.7% thedramaticincreaseinrecall),butlabelpropagationwill CURRENTCITY 0.4% 1.0% likely be confused by the overall distribution of different HIGHSCHOOL 0.1% 0.8% employersamongallfriendsandthereforedoesnotbenefit COLLEGE −0.6% 1.0% equallyfromaddingmorefriends,asweshownext. EMPLOYER −2.8% 1.2% COMPARISON WITH LABEL PROPAGATION. Figure 3 shows the lift in recall achieved by EDGEEXPLAIN over vide informationthat is orthogonalto friendships; thus, a LabelPropagationasweincreaseK forboth. Weobserve priori,onewouldexpecttheadditionofgroupmembership similar performance of both methods for hometown and featurestohavesignificantimpactonlabelinference. currentcity, butincreasingimprovementsforhighschool, college,andemployer.Thisagainpointstothefirsttwobe- Table 1 shows the lift in recall for EDGEEXPLAIN when ingeasiertoinfer,withthedifficultyofinferenceincrease group memberships are used in addition to K = 100 with the latter label types. With fewer employer-based friends. While the addition of group memberships in- friendships, the prototypical example of Figure 1 would creasesthesizeofthegraphby≈ 25%,theobservedben- also occurfrequently,with LabelPropagationlikelypick- efitsforrecallareminor: amaximumliftofonly1.2%for ingcommonemployersof(say)hometownfriendsinstead employerinference,andindeedreducedrecallat1forsev- of the less common friendships based on the actual em- erallabeltypes. Notethattheliftinrecallwouldhaveap- ployer. By attempting to explain each friendship, EDGE- pearedverysignificanthadwecomparedittoLabelPropa- EXPLAINisabletoinfertheemployerevenundersuchdif- gationwithK =100;however,thisgainlargelydisappears ficult circumstances, and the ability to perform well even when the friendships are considered in the framework of for under-represented label types makes EDGEEXPLAIN EDGEEXPLAIN. Thus, it is not merely the scalability of particularlyattractive. EDGEEXPLAIN, but also the careful modeling of proper- ties(P1)-(P3)thatmakesgroupmembershipredundant. INCLUSION OF EXTRA FEATURES. In Section 5, we dis- Given the a priori expectations of the impact of group cussedhowextrafeaturescouldbeusedwithintheEDGE- memberships,thissurprisingresultsuggeststhatinforma- EXPLAIN framework. In particular, we showed how the tion regarding our label types are already encoded in the factthatsomeusersaremembersofgroupscanbeusedto structure of the social network and hence the orthogonal infer(say)theircollege,ifthegroupturnsouttobecollege- informationfromthegroupmembershipsactuallyturnout specificgroup. Groupmembershipsareextensiveandpro- toberedundant. JointInferenceofMultipleLabelTypesinLargeNetworks 140% 80% el of EdgeExplain over LabPropagation 110246800000%%%%%xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxKKKK====2251000000 dgeExplain over Label Propagation 246000%%% xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxKKKK====2512000000 Lift 200%%xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxK=400 Lift of E 0%xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxK=400 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx -20% -20% Hometown Current cityHigh school College Employer Hometown Current cityHigh school College Employer (a)Recallat1 (b)Recallat3 Figure3.Liftof EDGEEXPLAINoverLabelPropagation:IncreasingthenumberoffriendsKbenefitsEDGEEXPLAINmuchmorethan labelpropagationforhighschool,college,andespeciallyemployer. 1 50% erence 0.8 ==0.1 40% xxxxxxxxxxxxxxxxxxx obability of correct inf 000...246 HCHCEmouioglrmplhreel egoSntecyothe wcoritonyl of EdgeExplain over 1230000%%%% xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx rrrr(cid:3)(cid:3)(cid:3)(cid:3)AAAA(cid:3)(cid:3)(cid:3)(cid:3)(cid:237)æ(cid:237)(cid:238) (cid:236)(cid:236) Pr 0 0 0.05 0.1 0.15 0.2 Lift -10% xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx r(cid:3)A(cid:3)(cid:240)(cid:236) Fraction of friends sharing user's true label HometownCurrent cityHigh school College Employer x Figure4.Probabilityofcorrectlyinferring(inthetop-3)thevalue Figure5.Effect of α: Lift in recall at 1 is plotted for different ofagivenlabeltypetforauser,giventhefractionoffriendswith valuesof α, withrespect toα = 0.1. iThebest resultsarefor known label fortwhoactually share theuser’slabel fort: All α∈[10,40]. labeltypesarebroadlysimilar,withafractionof0.1usuallybeing sufficientforinference.Forfraction>0.2,theplotflattensout. of α within this range. Recall that with large α, a single matching label is enough to explain an edge, while with THE LIMITS OF RESOLUTION. Our model theoretically small α, multiple matching labels may be needed. Thus, should be able to handle any number of label types, but the outperformance of large α provides strong empirical empirically this may not hold true for our network. How validationofproperty(P2)(onournetwork). many friends sharing a certain label type (say, the same college)doesauserneedtohaveinordertocorrectlyinfer 1600 thevalueofthatlabeltype? Toanswerthis,weselect,for eachuser,thesetoffriendswhoselabelforthegivenlabel typetisknown,andwecomputethefractionthatactually utes) 1200 n sharedthe user’slabelfort. Figure4showstheprobabil- mi ity that EDGEEXPLAIN correctlyinferstheuser’slabelas me ( 800 a functionof this fraction(i.e., the correctlabelis among g ti EdgeExplain nin Label Propagation the top 3 predictions). All label types are similar, though n 400 u R highschoolissomewhateasierandemployerharder;hav- ing10−15%offriendssharingauser’slabelissufficient 0 toinferthelabelinourgraph.Notethatcertainlabeltypes 0 100 200 300 400 500 Number of friends K aremorelikelytobepubliclydeclaredthanothers,andthis explainsdifferencesinrecallobservedearlier. Figure6.RunningtimeincreaseslinearlywithK. EFFECTOFα. Figure5showsthattheliftinrecallat1for variousvaluesoftheparameterα,withrespecttoα=0.1. RUNNING TIME. Figure6showsthewall-clocktimeasa Performance generally improves with increasing α. Re- function of K. The running time should depend linearly sultsforrecallat3arequalitativelysimilar,thoughtheef- onthegraphsize,whichgrowsalmostlinearlywithK;as fectismoremuted.Wefindthatα∈[10,40]offerthebest expected,theplotislinear,withdeviationsduetogarbage results,andEDGEEXPLAINisrobusttothespecificchoice collectionstoppagesinJava. JointInferenceofMultipleLabelTypesinLargeNetworks 7. Conclusions Duchi, J., Shalev-Shwartz,S., Singer,Y., andChandra, T. Efcientprojectionsontothe ℓ -ballfor learningin high 1 Weproposedtheproblemofjointlyinferringmultiplecor- dimensions. InICML,2008. related label types in a large network and described the problemswithexistingsingle-labelmodels. Wenotedthat Friedman,N.,Getoor,L.,Koller,D.,andPfeffer,A.Learn- oneparticularfailuremodeofexistingmethodsinourprob- ingProbabilisticRelationalModels.InIJCAI,pp.1300– lemsettingisthatedgesareoftencreatedforareasonasso- 1309,1999. ciatedwithaparticularlabeltype(e.g.,inasocialnetwork, two users may link because they went to the same high Giraph. The Apache Giraph Project. school,buttheydidnotgotothesamecollege).Weidenti- http://giraph.apache.org. fiedthreenetworkpropertiesthatmodelthisphenomenon: Ho,Q.,Eisenstein,J.,andXing,E. Documenthierarchies edgesarecreatedforareason(P1),theyaregenerallycre- fromtextandlinks. InWWW,pp.739–748,2012. atedonlyforonereason(P2),andsharingthesamevalue foralabeltypeisnecessarybutnotsufficientforhavingan Hoff, P., Raftery, A., and Handcock, M. Latent space edgebetweentwonodes(P3). approachesto social network analysis. JASA, 97(460): We introduced EDGEEXPLAIN, which carefully models 1090–1098,2002. these properties. It leveragesa gradient-basedmethodfor Hummel, R. and Zucker, S. On the foundationsof relax- collectiveinferencewhichallowsforfastiterativeinference thatisequivalentinrunningtimetobasiclabelpropagation. ationlabelingprocesses. IEEEPAMI,5:267–287,1983. Ourexperimentswithabillion-nodesubsetoftheFacebook Koller, D. and Pfeffer, A. Probabilistic frame-based sys- graph amply demonstrate the benefits of EDGEEXPLAIN, tems. InAAAI,pp.580–587,1998. withsignificantimprovementsacrossasetofdifferentlabel types.Ourfurtheranalysisvalidatesmanyoftheproperties Kyrillidis,A.,Becker,S.,Cevher,V.,andKoch,C. Sparse andintuitionswehadaboutmodelingnetworks,primarily projectionsontothesimplex. JMLR,28(2),2013. thatonecan achievesignificantimprovementsif onecon- sidersandmodelsthereasonanedgeexists. Whetherone Lu,Q.andGetoor,L. Link-basedclassification. InICML, isinterestedininferringoneormultiplelabeltypes,mod- 2003. elingtheseexplanationswillhavesignificantimpactonthe accuracyofthefinalpredictions. Macskassy,S.A. Improvinglearninginnetworkeddataby combiningexplicitandminedlinks. InAAAI,2007. References Macskassy, S. A. and Provost, F. Classification in net- worked data: A toolkit and a univariate case study. Baluja,S.,Seth,R.,Sivakumar,D.,Jing,Y.,Yagnik,J.,Ku- JMLR,8:935–983,2007. mar, S., Ravichandran, D., and Aly, M. Video sugges- tion and discoveryfor YouTube: Taking random walks Malewicz, G., Austern, M. H., Bik, A. J. C., Horn, I., throughtheviewgraph. InWWW,pp.895–904,2008. Leiser, N., and Czajkowski, G. Pregel: A system for Beck,A.andTeboulle,M.Gradient-basedalgorithmswith large-scalegraphprocessing. InSIGMOD,2010. applicationstosignalrecovery. 2009. McPherson, M., Smith-Lovin, L., and Cook, J. M. Birds Belkin,M.,Matveeva,I.,andNiyogi,P. Tikhonovregular- of a Feather: Homophily in Social Networks. Annual izationandsemi-supervisedlearningonlargegraphs. In ReviewofSociology,27:415–444,2001. COLT,pp.624–638,2004. Miller, K., Griffiths, T., and Jordan, M. Nonparametric Belkin, M., Niyogi, P., and Sindhwani, V. On manifold latent feature models for link prediction. In NIPS, pp. regularization. InAISTATS,2005. 1276–1284,2009. Chakrabarti,S.,Dom,B.,andIndyk,P.Enhancedhypertext Nallapati, R., Ahmed, A., Xing, E., and Cohen, W. Joint categorizationusinghyperlinks. InSIGMOD,volume1, latent topic models for text and citations. In KDD, pp. pp.307–319,1998. 542,2008. Chang, J. and Blei, D. Hierarchicalrelational modelsfor documentnetworks. TheAnnalsofAppliedStatistics,4 Neville, J. and Jensen, D. Relational Dependency Net- (1):124–150,September2010. works. JMLR,8:653–692,2007. Ching,A. ScalingApacheGiraphtoatrillionedges. Face- Palla, K., Knowles, D., and Ghahramani, Z. An infinite bookEngineeringblog,2013. latentattributemodelfornetworkdata. InICML,2012. JointInferenceofMultipleLabelTypesinLargeNetworks Perlich, C. and Provost, F. Aggregation-basedfeature in- ventionandrelationalconceptclasses. InKDD,pp.167, 2003. Rosenfeld, A. and Hummel, R. Scene labeling by relax- ationoperations. KDD,SMC-6(6):420–433,1976. Talukdar,P. andCohen,W. Scalinggraph-basedsemisu- pervisedlearningtolargenumberoflabelsusingcount- minsketch. InAISTATS,2014. Talukdar,P.andCrammer,K. Newregularizedalgorithms fortransductivelearning. InECML,2009. Taskar,B.,Abbeel,P.,andKoller,D.DiscriminativeProba- bilisticModelsforRelationalData.InUAI,pp.485–492, 2002. Valiant,L.G. A bridgingmodelforparallelcomputation. CACM,33(8),1990. Zhou,D.,Bousquet,O.,Lal,T.,Weston,J.,andScho¨lkopf, B. Learningwithlocalandglobalconsistency. InNIPS, volume1,2004. Zhu, X. and Ghahramani, Z. Learning from labeled and unlabeleddatawithlabelpropagation. Technicalreport, CarnegieMellonUniversity,2002. Zhu,X.,Ghahramani,Z.,andLafferty,J. Semi-supervised learning using Gaussian fields and harmonic functions. InICML,2003.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.