Trust from the past: Bayesian Personalized Ranking based Link Prediction in Knowledge Graphs Baichuan Zhang∗ Sutanay Choudhury† Mohammad Al Hasan‡ Xia Ning‡ Khushbu Agarwal† Sumit Purohit† Paola Pesantez Cabrera§ 6 1 0 2 Abstract Resource Description Framework (RDF) is a frequent b Link prediction, or predicting the likelihood of a link choice for capturing the interactions between two enti- e in a knowledge graph based on its existing state is a ties. A RDF dataset is equivalent to a heterogeneous F key researchtask. It differs from a traditional link pre- graph,where eachvertexandedge canbelong to differ- 5 diction task in that the links in a knowledge graph are ent classes. The class information captures taxonomic 1 categorizedintodifferentpredicatesandthelinkpredic- hierarchies between the type of various entities and re- tion performance of different predicates in a knowledge lations. Asanexample,aknowledgegraphmayidentify G] graph generally varies widely. In this work, we propose Kobe Bryant as a basketball player, while its ontology a latent feature embedding based link prediction model willindicatethatabasketballplayerisaparticulartype L which considers the prediction task for each predicate of athlete. Thus, one will be able to query for famous . s disjointly. To learn the model parameters it utilizes a athletes in the United States and find Kobe Bryant. c [ Bayesianpersonalizedrankingbasedoptimizationtech- The past few years have seen a surge in research nique. Experimental results on large-scale knowledge on knowledge representations and algorithms for build- 2 basessuchasYAGO2 showthatourlink predictionap- ing knowledgegraphs. For example, GoogleKnowledge v 8 proach achieves substantially higher performance than Vault[6],andIBMWatson[9]arecomprehensiveknowl- 7 severalstate-of-artapproaches. Wealsoshowthatfora edge bases which are built in order to answer questions 7 givenpredicate the topologicalproperties of the knowl- from the general population. As evident from these 3 edge graph induced by the given predicate edges are works,itrequiresmultitude ofeffortstobuildadomain 0 keyindicatorsofthelinkpredictionperformanceofthat specific knowledge graph, which are, triple extraction . 1 predicate in the knowledge graph. fromnaturelanguagetext,entityandrelationshipmap- 0 ping [25], and link prediction [21]. Specifically, triples 6 1 Introduction extracted from the text data sources using state of the 1 arttechniques such as OpenIE [8] and semantic role la- : Aknowledgegraphisarepositoryofinformationabout v beling [5] are extremely noisy, and simply adding noisy i entities,whereentities canbe anythingofinterestsuch X triple facts into knowledge graph destroys its purpose. as people, location, organization or even scientific top- Socomputationalmethodsmustbedevisedfordeciding r ics, concepts, etc. An entity is frequently characterized a which of the extracted triples are worthy of insertion by its association with other entities. As an example, into a knowledge graph. There are several considera- capturingtheknowledgeaboutacompanyinvolveslist- tions for this decision making: (1) trustworthiness of ing its products, location and key individuals. Simi- the data sources; (2) a belief value reported by a natu- larly,knowledgeaboutapersoninvolveshername,date rallanguageprocessingengineexpressingits confidence and place of birth, affiliation with organizations, etc. inthecorrectnessofparsing;and(3)priorknowledgeof subjectsandobjects. Thisparticularworkismotivated ∗Department of Computer and Information Science, In- by the third factor. diana University - Purdue University Indianapolis, USA, [email protected]. The work was conducted during author’s in- Link predictioninknowledgegraphissimply ama- ternshipatPacificNorthwestNationalLaboratory chine learningapproachfor utilizing prior knowledge of †PacificNorthwestNationalLaboratory,Richland,WA,USA, subjectsandobjectsasavailableintheknowledgegraph {Sutanay.Choudhury,Khushbu.Agarwal,Sumit.Purohit}@pnnl.gov forestimatingtheconfidenceofacandidatetriple. Con- ‡Department of Computer and Information Science, In- siderthefollowingexample: givenasocialmediapost“I diana University - Purdue University Indianapolis, USA, {alhasan,xning}@cs.iupui.edu wishTomCruisewasthe presidentofUnited States”,a §DepartmentofComputerScience,WashingtonStateUniver- natural language processing engine will extract a triple sity,Pullman,WA,USA,[email protected] (“Tom Cruise”, “president of”, “United States”). On of the links differs widely. So, existing link prediction the other hand, a web crawler may find the fact that methods are not very suitable for this task. We build “TomCruiseispresidentofDowntownMedical”,result- ourlinkpredictionmethodbyborrowingsolutionsfrom inginthetriple (“TomCruise”,“presidentof”,“Down- recommendersystem researchwhichaccept a user-item townMedical”). Althoughwegenerallydonothaveany matrix and for a given user-item pair, they return a information about the trustworthiness of the sources, score indicating the likelihood of the user purchasing our prior knowledge of the entities mentioned in this the item. Likewise, for a given predicate, we consider tripleswillenableustodecidethatthefirstoftheabove the set of subjects and objects as a user-item matrix triples is possibly wrong. Link prediction provides a and produce a real-valued score to measure the confi- principles approach for such a decision-making. Also denceofthe giventriple. Fortrainingthe modelweuse note that, once we decide to add a triple to the knowl- Bayesianpersonalizedranking(BPR)basedembedding edge graph, it is important to have a confidence value model [23], which has been a major work in the rec- associated with it. ommendation system. In addition, we also study the As we use a machine learningapproachto compute performance of our proposed link prediction algorithm the confidence of triple facts, it is important that we in terms of topological properties of knowledge graph quantitativelyunderstandthe degreeofaccuracyofour and present a linear regression model to reason about prediction [28]. It is important, because for the same its expected level of accuracy for each predicate. knowledge graph the prediction accuracy level varies Our contributions in this work are outlined below: from predicate to predicate. As an example, predict- ing one’s school or workplace can be a much harder 1. We implement a Link Prediction approach for task than predicting one’s liking for a local restaurant. estimating confidence for triples in a Knowledge Therefore, given two predicates “worksAt” and “likes”, Graph. Specifically, we borrow from successful we expect to see widely varying accuracy levels. Also, approaches in the recommender systems domain, theaverageaccuracylevelsvarywidelyfromoneknowl- adopt the algorithms for knowledge graphs and edge graph to another. The desire to obtain a quanti- perform a thorough evaluation on a prominent tative grasp on prediction accuracy is complicated by benchmark dataset. a number of reasons: 1) Knowledge graphs constructed 2. WeproposeaLatentFeatureEmbeddingbasedlink fromwebtextorusingmachinereadingapproachescan recommendationmodelforpredictiontaskanduti- havea verylarge number of predicates that make man- lizeBayesianPersonalizedRankingbasedoptimiza- ual verification difficult [6]; 2) Creation of predicates, tion technique for learning models for each pred- or the resultant graph structure is strongly shaped by icate (Section 4). Our experiments on the well the ontology,and the conversionprocess used to gener- known YAGO2 knowledge graph (Section 5) show ate RDF statements from a logical record in the data. thattheBPRapproachoutperformsothercompet- Therefore,same data source can be representedin very ing approaches for a significant set of predicates differentmodelsandthisleadstodifferentaccuracylev- (Figure 1). elsforthesamepredicate. 3)Theeffectivenessofknowl- edgegraphshaveinspiredtheirconstructionfromevery 3. We apply a linear regression model to quantita- imaginable data source: product inventories (at retail- tively analyze the correlation between the predic- ers such as Wal-mart), online social networks (such as tionaccuracyforeachpredicateandthetopological Facebook),andwebpages(Google’sKnowledgeVault). structure of the induced subgraph of the original As we move from one data source to another, it is crit- Knowledge Graph. Our studies show that metrics ical to understand what accuracy levels we can expect such as clustering coefficient or averagedegree can from a given predicate. be used to reason about the expected level of pre- In this paper, we use a link prediction 1 approach diction accuracy (Section 5.3, Figure 2). for computing the confidence of a triple from the prior knowledgeaboutitssubjectandobject. Manyworksex- 2 Related Work istfor link prediction[11] insocialnetworkanalysis[4], There is a large body of work on link prediction in but they differ from the link prediction in knowledge knowledge graph. In terms of methodology, factoriza- graph; for earlier, all the links are semantically similar, tion based and related latent variable models [3,7,13, but for the latter based on the predicates the semantic 22,25], graphical model [14], and graph feature based method [17,18] are considered. 1Weuselinkpredictionandlinkrecommendationinterchange- There exists large number of works which focus ably. on factorization based models. The common thread among the factorization methods is that they explain timebutalsoincreasetheaccuracy. [15]combinesala- the triples via latent features of entities. [2] presents a tent feature model with an additive term to learn from tensor based model that decomposes each entity and latent and neighborhood-based information on multi- predicate in knowledge graphs as a low dimensional relationaldata. [6]fusestheoutputsofPRAandneural vector. However, such a method fails to consider the network model as features for training a binary classi- symmetry property of the tensor. In order to solve this fier. Our work strongly aligns with this combination issue, [22] proposes a relational latent feature model, approach. In this work, we build matrix factorization RESCAL, an efficient approach which uses a tensor based techniques that have been proved successful for factorizationmodel that takesthe inherentstructure of recommender systems and plan to incorporate graph relational data into account. By leveraging relational based features in future work. domain knowledge about entity type information, [3] proposes a tensor decomposition approach for relation 3 Background and Problem Statement extraction in knowledge base which is highly efficient Definition 3.1. We define the knowledge graph as a in terms of time complexity. In addition, various other collection of triple facts G = (S,P,O), where s ∈ S latent variable models, such as neural network based ando∈O arethe setofsubjectandobjectentities and methods [6,27], have been explored for link prediction p∈P isthesetofpredicatesorrelationsbetweenthem. task. However, the major drawback of neural network G(s,p,o)=1 ifthere is adirectlink oftype p froms to based models is their complexity and computational o, and G(s,p,o)=0 otherwise. cost in model training and parameter tuning. Many of thesemodelsrequiretuninglargenumberofparameters, Each triple fact in knowledge graph is a statement thus finding the right combination of these parameters interpreted as “A relationship p holds between entities is often considered more of an art than science. sando”. Forinstance,thestatement“KobeBryantisa Recently graphical models, such as Probabilistic player of LA Lakers”can be expressed by the following RelationalModels[10],RelationalMarkovNetwork[29], triple fact (“Kobe Bryant”, “playsFor”,“LA Lakers”). Markov Logic Network [14,24] have also been used for Definition 3.2. For each relation p ∈ P, we define link prediction in knowledge graph. For instance, [24] G (S ,O ) as a bipartite subgraph of G, where the p p p proposes a Markov Logic Network (MLN) based ap- corresponding set of entities s ∈ S , o ∈ O are p p p p proach, which is a template language for defining po- connected by relation p, namely G (s ,o )=1. p p p tentialfunctionsonknowledgegraphbylogicalformula. Despite its utility for modeling knowledgegraph,issues Problem Statement: For every predicate p ∈ P and suchasrulelearningdifficulty,tractabilityproblem,and given an entity pair (s,o) in Gp, our goal is to learn parameterestimationposeimplementationchallengefor a link recommendation model Mp such that xs,o = MLNs. Mp(s,o) is a real-valued score. Graph feature based approaches assume that the Due to the factthatthe producedreal-valuedscore existence of an edge can be predicted by extracting isnotnormalized,wecomputetheprobabilityPr(yp = s,o features fromthe observededges in the graph. Laoand 1), where yp is a binary random variable that is true s,o Cohen [17,18]proposePathRankingAlgorithm(PRA) iff Gp(s,o) = 1. We estimate this probability Pr using to performrandomwalk onthe graphandcompute the the logistic function as follows: probability of each path. The main idea of PRA is to 1 use these path probabilities as supervised features for (3.1) Pr(yp =1)= s,o 1+exp(−x ) each entity pair, and use any favorable classification s,o model, such as logistic regression and SVM, to predict Thus we interpret Pr(yp = 1) as the probability s,o the probability of missing edge between an entity pair that a vertex (or subject) s in the knowledge graph G in a knowledge graph. is in a relationship of given type p with another vertex It has been demonstrated [1] that no single ap- (or the object) o. proach emerges as a clear winner. Instead, the merits of factorization models and graph feature models are 4 Methods often complementary with each other. Thus combin- In this section, we describe our model, namely La- ing the advantages of different approaches for learn- tentFeatureEmbeddingModelwithBayesianPersonal- ing knowledge graph is a promising option. For in- ized Ranking (BPR) basedoptimizationtechnique that stance, [20] proposes to use additive model, which is we propose for the task of link prediction in a knowl- a linear combination between RESCAL and PRA. The edge graph. In our link prediction setting, for a given combination results in not only decrease the training predicate p, we first construct its bipartite subgraph G (S ,O ). Then we learn the optimal low dimen- based distance function: p p p sionalembeddingsforitscorrespondingsubjectandob- ject entities s ∈ S , o ∈ O by maximizing a ranking (4.3) p p p p bSatosechdadsitsictanGcreadfuienncttioDne.scTehnet l(eSaGrnDin).gTprhoeceSsGsDrelibeassoend BPR=mΘapxP(sp,o+p,o−p)∈Dplnσ(xsp,o+p −xsp,o−p)−λΘp ||Θp||2 optimization technique iteratively updates the low di- where Dp is a set of samples generated from the mensionalrepresentationofspandop untilconvergence. training data for predicate p, Gp(sp,o+p) = 1 and Then the learned model is used for ranking the unob- Gp(sp,o−p)=0. Andxsp,o+p andxsp,o−p arethepredicted served triple facts in descending order such that triple scores of subject s on objects o+ and o− respectively. p p p facts with higher score values have a higher probability We use the proposed latent feature based embedding of being correct. model shown in Equation 4.2 to compute x and sp,o+p xsp,o−p respectively. The last term in Equation 4.3 is a 4.1 Latent Feature Based Embedding Model l -norm regularization term used for model parameters 2 For each predicate p, the model maps both its corre- Θ = {Up,Vp,bp} to avoid overfitting in the learning p sponding subject and object entites sp and op into low- process. In addition, the logistic function σ(.) in dimensional continuous vector spaces, say Up ∈ IR1×K Equation 4.3 is defined as σ(x)= 1 . andVp ∈IR1×K respectively. Wemeasurethsecompati- Notice that the Equation 4.3 i1s+de−iffxerentiable, thus o bilitybetweensubjectspandobjectopasdotproductof weemploythewidelyusedSGDtomaximizetheobjec- itscorrespondinglatentvectorswhichisgivenasbelow: tive. Inparticular,ateachiteration,forgivenpredicate p, we sample one observed entity pair (s ,o+) and one p p unobserved one (s ,o−) using uniform sampling tech- (4.2) x =(Up)(Vp)T +bp p p sp,op s o o nique. Then we iteratively update the model param- where Up ∈ IR|S|×K, Vp ∈ IR|O|×K, and bp ∈ eters Θp based on the sampled pairs. Specifically, for IR|O|×1. |S| and |O| denote the size of subject and each training instance, we compute the derivative and update the corresponding parameters Θ by walking object associated with predicate p respectively. K is p the number of latent dimensions and bp ∈ IR is a bias along the ascending gradient direction. o For each predicate p, given a training triple term associated with object o. Given predicate p, the (s ,o+,o−), the gradient of BPR objective in Equa- higher the score of xsp,op, the more similar the entities tiopn 4p.3 wpith respect to Up, Vp , Vp , bp , bp can be sp and op in the embedded low dimensional space, and s o+ o− o+ o− computed as follows: the higher the confidence to include this triple fact into knowledge base. ∂BPR = ∂lnσ(xsp,o+p −xsp,o−p) −2λpUp 4.2 Bayesian Personalized Ranking ∂Up ∂Up s s s s Iainns tcihmoelplaelibCcoiotrmaftmeiveederbcfiaelctpkerl/aibntifgno,armrpyo,sfsietoeimvdebe-aoucnskley.rsdFaotnoarlyiesbxuakymnopbwluent, = ∂∂lnσσ(x(sxps,po,+po+p−−xsxps,po,−po−p)) × ∂∂σ((xxsspp,o,o+p+p−−xxsspp,o,o−p−p)) do not rate items. Motivated by [23], we employ ×∂(xsp,o+p−xsp,o−p) −2λpUp ∂Usp s s Bayesian Personalized Ranking (BPR) based approach 1 fdoormmaoind,elglievaernniunsge.r-Sitpeemcifimcaaltlryi,xi,nBrePcRombmaseenddearpspyrsotaecmh = σ(xsp,o+p −xsp,o−p) ×σ(xsp,o+p −xsp,o−p) assigns the preference of user for purchased item with (cid:0)1−σ(xsp,o+p −xsp,o−p)(cid:1)×(Vop+−Vop−)−2λpsUsp hthigishecrosnctoerxet,thwaen uasns-ipgunrcohbasseerdveitdemtr.ipLleikefawcitsse,huingdheerr = (cid:0)1−σ(xsp,o+p −xsp,o−p)(cid:1)(Vop+−Vop−)−2λpsUsp (4.4) score than unobserved triple facts in knowledge base. We assume that unobserved facts are not necessarily We obtain the following using similar chain rule negative, rather they are “less preferable” than the derivation. observed ones. For our task, in each predicate p, we denote the ∂BPR observed subject/object entity pair as (sp,o+p) and (4.5) ∂Vp =(cid:0)1−σ(xsp,o+p−xsp,o−p)(cid:1)×Usp−2λpo+Vop+ unobserved one as (s ,o−). The observed facts in our o+ p p case are the existing link between sp and op given Gp (4.6) andunobservedonesarethemissinglinkbetweenthem. ∂BPR Given this fact, BPR maximizes the following ranking ∂Vp =(cid:0)1−σ(xsp,o+p −xsp,o−p)(cid:1)×(−Usp)−2λpo−Vop− o− Algorithm 1 Bayesian Personalized Ranking Based ∂BPR Latent Feature Embedding Model (4.7) ∂bp =(cid:0)1−σ(xsp,o+p −xsp,o−p)(cid:1)×1−2λpo+bpo+ Input: latent dimension K, G, target predicate p o+ Output: Up, Vp, bp (4.8) 1: Giventargetpredicatepandentireknowledgegraph ∂BPR ∂bpo− =(cid:0)1−σ(xsp,o+p −xsp,o−p)(cid:1)×(−1)−2λpo−bpo− 2: mG,=connustmrubcetriotsf sbuipbajerctitteenstuitbigersaipnhG, Gpp Next, the parameters are updated as follows: 3: n = number of object entities in Gp 4: Generate a set of training samples Dp = {(s ,o+,o−)} using uniform sampling technique ∂BPR p p p (4.9) Usp =Usp+α× ∂Up 5: Initialize Up assizem×K matrix with0meanand s standard deviation 0.1 6: Initialize Vp as size n×K matrix with 0 mean and ∂BPR (4.10) Vp =Vp +α× stardard deviation 0.1 o+ o+ ∂Vp o+ 7: Initializebp assizen×1columnvectorwith0mean and stardard deviation 0.1 ∂BPR (4.11) Vop− =Vop− +α× ∂Vop− 98:: forUpadlla(tsepU,osp+pb,oa−pse)d∈onDpEqduoation 4.9 10: Update Vp based on Equation 4.10 (4.12) bpo+ =bpo+ +α× ∂∂BbPpo+R 1112:: UUppddaattee Vbpooop++−bbaasseeddoonnEEqquuaattiioonn44..1121 ∂BPR 13: Update bpo− based on Equation 4.13 (4.13) bp =bp +α× 14: end for o− o− ∂bpo− 15: return Up, Vp, bp where α is the learning rate. 4.3 Pseudo-code and Complexity Analysis 5 Experiments and Results The pseudo-code of our proposedlink prediction model This section presents our experimental analysis of the is described in Algorithm 1. It takes the knowledge Algorithm 1 for thirteen unique predicates in the well graph G and a specific target predicate p as input and known YAGO2 knowledge graph [12]. We construct a generates the low dimensional latent matrices Up, Vp, model for each predicate and describe our evaluation bp asoutput. Line1constuctsthebipartitesubgraphof strategies, including performance metrics and selection predicatep,Gp givenentireknowledgegraphG. Line2- ofstate-of-the-artmethods forbenchmarking insection 3 compute the number of subject and object entities as 5.1. We aim to answer two questions through our mandninresultantbipartitesubgraphGprespectively. experiments: Line 4 generates a collection of triple samples using uniform sampling technique. Line 5-7 initialize the 1. Howdoesourapproachcomparewithrelatedwork matrices Up, Vp, bp using Gaussian distribution with for link recommendation in knowledge graph? 0 mean and 0.1 standard deviation, assuming all the entries in Up, Vp and bp are independent. Line 8-14 2. For a predicate p, can we reason about the link updatecorrespondingrowsofmatricesUp,Vp,bpbased prediction model performance M in terms of the p on the sampled instance (s ,o+,o−) in each iteration. structural metrics of the bipartite graph G ? p p p p As the sample generation step in line 4 is prior to the modelparameterlearning,thustheconvergencecriteria Table 1 shows the statistic of various YAGO2 of Algorithm 1 is to iterate over all the sampled triples relations used in our experiments. # Subjects and # in D . Objects represent the number of subject and object p Given the constructed G as input, the time entities associated with its corresponding predicate. p complexity of the update rules shown in Equa- The last column shown in Table 1 shows the number tions 4.9 4.10 4.11 4.12 4.13 is O(cK), where K is of facts for each relation in YAGO2. We run all the the number oflatent features. The totalcomputational experiments on a 2.1 GHz Machine with 4GB memory complexity of Algorithm 1 is then O(|D |·cK), where running Linux operating system. The algorithms are p |D |isthetotalsizeofpre-sampledtriplesshowninline implemented in Python language along with NumPy p 4 of Algorithm 1. and SciPy libraries for linear algebra operations. The HR Result of YAGO Dataset (K=50) ARHR Result of YAGO Dataset (K=50) AUC Result of YAGO Dataset (K=50) 0.7 0.40 1.0 Rand Rand Rand 0.6 MP 0.35 MP MP MF MF 0.8 MF BPR 0.30 BPR BPR 0.5 HR00..43 ARHR000...212055 AUC00..46 0.2 0.10 0.2 0.1 0.05 0.0importexporitnterelastnguadgeealwhitahppepnaIrnticipcaotnenehctaschiinlfdluewnrciteeMusiecdit own 0.00importexportinteresltanguagdeealwithhappenpIanrticipatceonnecthaschilidnfluewnrciteeMusicfoerdit own 0.0importexportinteresltanguagdeealwithhappenpIanrticipatceonnecthaschilidnfluewnrciteeMusicfoerdit own (a) HR Comparison among different link (b) ARHR Comparison among different (c) AUC Comparison among different link recommendationmethods linkrecommendationmethods recommendation methods Figure 1: Link Recommendation Comparison on YAGO2 Relations 0.8 slope = 9.78 0.5 slope = 4.86 1.0 slope = 3.75 0.7 intercept = 0.15 intercept = 0.08 intercept = 0.63 rvalue = 0.75 0.4 rvalue = 0.72 0.9 rvalue = 0.49 0.6 0.5 0.3 0.8 HR00..43 ARHR0.2 AUC0.7 0.2 0.1 0.6 0.1 0.0 0.5 0.0 −0−.10.01 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 −0−.10.01 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0−.40.01 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 density density density (a) GraphDensityandHR (b) GraphDensityandARHR (c) GraphDensityandAUC 0.8 slope = 0.06 0.5 slope = 0.03 1.0 slope = 0.04 0.7 intercept = 0.05 intercept = 0.02 intercept = 0.51 rvalue = 0.58 0.4 rvalue = 0.57 0.9 rvalue = 0.71 0.6 0.5 0.3 0.8 HR00..43 ARHR0.2 AUC0.7 0.2 0.1 0.6 0.1 0.0 0.5 0.0 −0.10 2 4 6 8 10 −0.10 2 4 6 8 10 0.40 2 4 6 8 10 degree degree degree (d) GraphAverageDegreeandHR (e) GraphAverageDegreeandARHR (f) GraphAverageDegreeandAUC 0.8 0.5 1.0 slope = -0.79 intercept = 0.58 slope = -0.39 slope = -0.5 0.6 rvalue = -0.65 0.4 irnvtaelurcee =pt -=0. 601.29 0.9 irnvtaelurcee =pt -=0. 609.87 0.3 0.8 HR0.4 ARHR0.2 AUC0.7 0.2 0.1 0.6 0.0 0.0 0.5 −0.20.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 −0.10.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.40.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 clustering coefficient clustering coefficient clustering coefficient (g) ClusteringCoefficientandHR (h) ClusteringCoefficientandARHR (i) ClusteringCoefficientandAUC Figure 2: Quantitative Analysis Between Graph Topology and Link Recommendation Model Performance Relation #Subjects #Objects #ofFactsinYAGO2 knowledge base, this method presents a non- Import 142 62 391 personalized ranked object list based on how often Export 140 176 579 objectentities are connectedamong allsubject en- isInterestedIn 358 213 464 tities. hasOfficialLanguage 583 214 964 dealsWith 131 124 945 3. MF: The matrix factorization method is proposed happenedIn 7121 5526 12500 participatedIn 2330 7043 16809 by[16],whichusesapoint-wisestrategyforsolving isConnectedTo 2835 4391 33581 the one-class item recommendation problem. hasChild 10758 12800 17320 influence 8056 9153 25819 During the model evaluation stage, we use three wroteMusicFor 5109 21487 24271 edited 549 5673 5946 popularmetrics,namelyHitRate(HR),AverageRecip- owns 8330 24422 26536 rocalHit-Rank(ARHR),andAreaUnderCurve(AUC), tomeasurethelink recommendationqualityofourpro- Table 1: Statistics of Various Relations in YAGO2 posedapproachincomparisontobaselinemethods. HR Dataset is defined as follows: software is available online for download 2. #hits (5.14) HR= #subjects 5.1 Experimental Setting For our experiment, in order to demonstrate the per- where #subjects is the total number of subject en- formanceof our proposedlink predictionmodel, we use tities in test set, and #hits is the number of subjects the YAGO2 dataset and several evaluation metrics for whoseobjectentityinthetestsetisrecommendedinthe allcomparedalgorithms. Particularly,foreachrelation, size-Nrecommendationlist. Thesecondevaluationmet- we split the data into a training part, used for model ric,ARHR,consideringtherankingoftherecommended training,andatestpart,usedformodelevaluation. We objectforeachsubjectentityinknowledgegraph,isde- apply5-timeleaveoneoutevaluationstrategy,wherefor fined as below: eachsubject,werandomlyremoveonefact(onesubject- object pair) and place it into test set S and remain- test #hits ing in the training set S . For every subject, the 1 1 train (5.15) ARHR= X training model will generate a size-N rankedlist of rec- #subjects pi i=1 ommended objects for recommendationtask. The eval- where if an object of a subject is recommended for uation is conducted by comparing the recommendation connection in knowledge graph which we name as hit listofeachsubjectandtheobjectentityofthatsubject underthisscenario,p isthepositionoftheobjectinthe in the test set. Grid search is applied to find regular- i rankedrecommendationlist. As wecansee,ARHRis a izationparameters,andwesetthe valuesofparameters weighted version of HR and it captures the importance used in section 4.2 as λs = λo+ = λo− = 0.005. For of recommended object in the recommendation list. other model parameters, we fix learning rate α = 0.2, The last metric, AUC is defined as follows: and number of latent factors K = 50 respectively. For parameter in model evaluation, we set N =10. (5.16) In order to illustrate the merit of our proposed approach, we compare our model with the following AUC= #sub1jectsPs∈subjects |E1(s)|P(o+,o−)∈E(s)δ(xs,o+ >xs,o−) methodsforlinkpredictioninaknowledgegraph. Since Where E(s) = {(o+,o−)|(s,o+) ∈ S ∩(s,o−) 6∈ test the problemwesolveinthis paperissimilarto theone- (S ∪S )}, and δ() is the indicator function. test train classitemrecommendation[23]inrecommendersystem Forallofthreemetrics,highervaluesindicatebetter domain, we consider the following state-of-the-art one- model performance. Specifically, the trivial AUC of a class recommendation methods as baseline approaches randompredictoris 0.5 andthe best value ofAUC is 1. for comparison. 5.2 YAGO2 Relation Prediction Performance 1. Random (Rand): For eachrelation,this method randomly selects subject-object entity pair for link Figure 1 shows the average link prediction per- recommendation task. formance for YAGO2 relations using various meth- 2. Most Popular (MP): For each predicate in ods. Our proposed latent feature embedding approach shows overall improvement compared with other algo- 2https://sites.google.com/site/baichuanzhangpurdue rithms on most of relations in YAGO2. For instance, for all the YAGO2 predicates used in the experiment, intercept,andcorrelationcoefficient(rvalue)tocapture ourproposedmodelconsistentlyoutperformsMFbased the association trend. method, which demonstrates the empirical experience From Figure 2, both graphdensity and graphaver- that pairwise ranking based method achieves much age degree show strong positive correlation signal with better performance than pointwise regression based proposed link prediction model as demonstrated by method given implicit feedback for link recommenda- rvalue. As our approach is inspired by collaborative tion task. Comparedwith Popularitybased recommen- filtering for recommender systems that accept a user- dation method MP, our method obtains better perfor- item matrix as input, for resultant graphof each predi- mance for most predicates. For example, predicates cate, higher graph density indicates higher matrix den- suchas“participate”,“connect”,“hasChild”,and“influ- sityinuser-itemmatrix,whichnaturallyleadstobetter ence”,ourproposedmodelachievesmorethan10times recommendation performance in recommender system better performance in terms of both HR and ARHR. domain. Similar explanation can be adapted to graph However, for several predicates such as “import”, “ex- average degree. For the clustering coefficient, it shows port”,and“language”,MP basedmethod performsthe strong negative correlation signal with link prediction best among all the competing methods. The good per- modelperformance. Forinstance,intermsofAUC,the formance of MP is owing to the semantic meaning of rvalue is around −0.69. As clustering coefficient (cc) specific predicate. For instance, “import” represents is the number of closed triples over the total number Country/Product relation in YAGO2, which indicates of triples in graph, smaller value of cc indicates lower the types of its subject and object entities are geo- fraction of closed triples in the graph. Based on the graphic region and commodity respectively. For such transitivity property of a social graph,which states the a predicate, most popular object entities such as food, friends of your friend have high likelihood to be friends cloth, fuel are linked to most of the countries, which themselves [26,30], it is relatively easier for link pre- helps MP basedmethod obtain goodlink recommenda- diction model to predict (i.e.,hit) such link with open tion performance. triple property in the graph, which leads to better link prediction performance. 5.3 Analysis and Discussion 6 Conclusion and Future Work Figure 1 shows that the link prediction model per- Inspired by the success of collaborative filtering algo- formance widely varies from predicate to predicate in rithms for recommender systems, we propose a latent the YAGO2 knowledge base. For example, the HR featurebasedembedding modelforthe taskoflink pre- of predicate “dealsWith” is significantly better than diction in a knowledge graph. Our proposed method “own”. Thus it is criticalthat we quantitatively under- provides a measure of “confidence” for adding a triple standthemodelperformanceacrossvariousrelationsin intotheknowledgegraph. Weevaluateourimplementa- a knowledge graph. Recall from the Problem State- tion on the well known YAGO2 knowledge graph. The ment that given a predicate p, our model M only ac- p experimentsshowthatourBayesianPersonalizedRank- countsforthebipartitesubgraphG . Motivatedby[19], p ing based latent feature embedding approach achieves we study the impact of resultant graph structure of G p better performancecomparedwith two state-of-artrec- on the performance of Mp. ommender system models: Most Popular and Matrix For each predicate p, we compute several graph Factorization. Wealsodevelopalinearregressionmodel topology metrics on its bipartite subgraph Gp such as toquantitativelystudythecorrelationbetweentheper- graph density, graph average degree, and clustering co- formance of link prediction model itself and various efficient. Figure 2 shows the quantitative analysis be- topologicalmetrics of the graphfrom whichthe models tween graph structure and link prediction model per- are constructed. The regression analysis shows strong formance of each predicate. In each subfigure, x-axis correlationbetweenthelinkpredictionperformanceand represents the computed graph topology metric value graph topological features, such as graph density, aver- of each predicate and y-axis denotes our proposed link age degree and clustering coefficient. prediction model performance in terms of HR, ARHR, Foragivenpredicate,webuildlinkpredictionmod- and AUC. Each cross point shown in blue represents els solely based on the bipartite subgraph of the origi- one specific YAGO2 predicate used in our experiments. nalknowledgegraph. However,asreal-worldexperience Then we developed a linear regression model to under- suggests,theexistenceofarelationbetweentwoentities standthecorrelationbetweenlinkpredictionmodelper- can also be predicted from the presence of other rela- formanceandeachgraphmetric. Foreachlinearregres- tions, either direct or through common neighbors. As sion curve shown in red color, we also report its slope, an example, the knowledge of where someone studies andwhothey arefriends withisusefultopredictpossi- ploring and querying world knowledge in time, space, ble workplaces. Incorporating such intuition as “social context,and many languages. In WWW,2011. signals” into our current model will be the prime can- [13] R. Jenatton, N. L. Roux, A. Bordes, and G. R. didate for an immediate future work. Another future Obozinski. A latent factor model for highly multi- relational data. In NIPS.2012. work would be to update the knowledge graph based [14] S. Jiang, D. Lowd, and D. Dou. Learning to refine an on the newer facts that become available over time in automatically extracted knowledge base using markov streaming data sources. logic. In ICDM,pages 912–917, 2012. [15] X. Jiang, V. Tresp, Y. Huang, and M. Nickel. Link Acknowledgement prediction in multi-relational graphs using additive This work was supported by the Analysis In Motion models. In SeRSy, pages 1–12, 2012. Initiative at Pacific Northwest National Laboratory, [16] Y. Koren, R. Bell, and C. Volinsky. Matrix factoriza- which is operated by Battelle Memorial Institute, and tion techniques for recommender systems. Journal of Computer Science, pages 30–37, 2009. by Mohammad Al Hasan’s NSF CAREER Award (IIS- [17] N.LaoandW.W.Cohen. Relational retrievalusinga 1149851). combination of path-constrained random walks. Jour- nal of Machine learning, pages 53–67, 2010. References [18] N.Lao, T.Mitchell, andW.W.Cohen. Randomwalk inferenceandlearning inalarge scale knowledgebase. In EMNLP,pages 529–539, 2011. [19] D.Liben-NowellandJ.Kleinberg. Thelinkprediction [1] A. Bordes and E. Gabrilovich. Constructing and mining web-scale knowledge graphs. In SIGKDD, problem for social networks. CIKM, 2003. [20] M. Nickel, X. Jiang, and V. Tresp. Reducing the 2014. [2] R.Bro. Parafac.tutorialandapplications. Chemomet- rank in relational factorization models by including rics and Intelligent Laboratory Systems, 1997. observable patterns. In NIPS,pages 1179–1187, 2014. [21] M. Nickel, K. Murphy, V. Tresp, and E. Gabrilovich. [3] K.-W. Chang, W. tau Yih, B. Yang, and C. Meek. A review of relational machine learning for knowledge Typed tensor decomposition of knowledge bases for graphs: From multi-relational link prediction to auto- relation extraction. ACL, 2014. mated knowledge graph construction. arXiv preprint [4] P.Chen and A.O. H.III. Deep communitydetection. IEEE Transactions on Signal Processing, 2015. arXiv:1503.00759, 2015. [22] M.Nickel,V.Tresp,andH.peterKriegel. Athree-way [5] R. Collobert, J. Weston, L. Bottou, M. Karlen, K.Kavukcuoglu,andP.Kuksa. Naturallanguagepro- model for collective learning on multi-relational data. cessing (almost) from scratch. JMLR, 2011. In ICML,pages 809–816, 2011. [23] S. Rendle, C. Freudenthaler, Z. Gantner, and [6] X. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang. L.Schmidt-Thieme. Bpr: Bayesian personalized rank- ing from implicit feedback. In UAI, 2009. Knowledge vault: A web-scale approach to probabilis- tic knowledge fusion. In SIGKDD, 2014. [24] M. Richardson and P. Domingos. Markov logic net- [7] L.Drumond,S.Rendle,and L. Schmidt-Thieme. Pre- works. Machine learning, pages 107–136, 2006. [25] S. Riedel, L. Yao, A. McCallum, and B. M. Marlin. dicting rdf triples in incomplete knowledge bases with tensorfactorization. InProceedings ofthe27thAnnual Relation extraction with matrix factorization anduni- ACM Symposium on Applied Computing, 2012. versal schemas. In HLT-NAACL, 2013. [26] T. K. Saha, B. Zhang, and M. Al Hasan. Name dis- [8] O. Etzioni, M. Banko, S. Soderland, and D. S. Weld. Open information extraction from the web. Commu- ambiguation from link data in a collaboration graph nications of the ACM,pages 68–74, 2008. using temporal and topological features. Social Net- work Analysis and Mining,pages 1–14, 2015. [9] D. A. Ferrucci, E. W. Brown, J. Chu-Carroll, J. Fan, [27] R. Socher, D. Chen, C. D. Manning, and A. Ng. D. Gondek, A. Kalyanpur, A. Lally, J. W. Murdock, Reasoning with neural tensor networks for knowledge E.Nyberg,J.M.Prager,N.Schlaefer,andC.A.Welty. base completion. In NIPS, pages 926–934, 2013. Building watson: An overview of the deepqa project. AI Magazine, 2010. [28] C. H. Tan, E. Agichtein, P. Ipeirotis, and E. Gabrilovich. Trust, but verify: Predicting [10] N. Friedman, L. Getoor, D. Koller, and A. Pfeffer. Learning probabilistic relational models. In IJCAI, contribution quality for knowledge base construction and curation. In WSDM,pages 553–562, 2014. pages 1300–1309, 1999. [29] B. Taskar, P. Abbeel, and D. Koller. Discriminative [11] M. A. Hasan, V. Chaoji, S. Salem, and M. Zaki. Link prediction using supervised learning. In In Proc. of probabilistic models for relational data. In UAI,2002. SDM06workshop onLinkAnalysis, Counterterrorism [30] B. Zhang, T. K. Saha, and M. Al Hasan. Name and Security, 2006. disambiguationfromlinkdatainacollaborationgraph. In ASONAM,2014. [12] J. Hoffart, F. M. Suchanek, K. Berberich, E. Lewis- Kelham, G. de Melo, and G. Weikum. Yago2: Ex-

