ebook img

Modèles d'embeddings à valeurs complexes pour les graphes de connaissances PDF

161 Pages·2017·24.52 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Modèles d'embeddings à valeurs complexes pour les graphes de connaissances

UNIVERSITE´ GRENOBLE ALPES Complex-Valued Embedding Models for Knowledge Graphs by Th´eo Trouillon A thesis submitted in partial fulfillment for the degree of Doctor of Philosophy in the Laboratoire d’Informatique de Grenoble Xerox Research Centre Europe December 2017 “Good tests kill flawed theories; we remain alive to guess again.” Karl Popper “Un homme doit savoir s’assoir sur un banc, manger du fromage, et ˆetre heureux.” Auteur Inconnu UNIVERSITE´ GRENOBLE ALPES Abstract Laboratoire d’Informatique de Grenoble Xerox Research Centre Europe Doctor of Philosophy Complex-Valued Embedding Models for Knowledge Graphs by Th´eo Trouillon The explosion of widely available relational data in the form of knowledge graphs enabled many applications, including automated personal agents, recommender systems and enhanced web search results. The very large size and notorious incompleteness of these databases calls for automatic knowledge graph completion methods to make these applications viable. Knowledge graph completion, also known as link prediction, deals with automatically understanding the structure of large knowledge graphs—labeled directed graphs—to predict missing entries—labeled edges. An increasingly popular approach consists in representing a knowledge graph as a 3rd-order tensor, and using tensor factorization methods to predict their missing entries. State-of-the-art factorization models propose different trade-offs between modeling ex- pressiveness, time and space complexity, and generalization abilities. We introduce a new model, ComplEx—for Complex Embeddings—to reconcile expressiveness, complexity and generalization through the use of complex-valued factorization. We corroborate our approach theoretically and show that all possible knowledge graphs can be exactly decomposed by the proposed model. Our approach based on complex embeddings is arguably simple, as it only involves a complex-valued trilinear product, whereas other methods resort to more and more complicated composition functions to increase their expressiveness. The proposed ComplEx model is scalable to large data sets as it remains linear in both space and time, while consistently outperforming alternative approaches on standard link-prediction benchmarks.1 We also demonstrate its ability to learn useful vectorial representations for other tasks, by enhancing word embeddings that improve performances on the natural language problem of entailment recognition between pair of sentences. In the last part of this thesis, we explore factorization models ability to learn relational patterns from observed data. By their vectorial nature, it is not only hard to interpret why this class of models works so well, but also to understand where they fail and how they might be improved. We conduct an experimental survey of state-of-the-art models, not towards a purely comparative end, but as a means to get insight about their inductive abilities. To assess the strengths and weaknesses of each model, we create simple tasks that exhibit first, atomic properties of knowledge graph relations, and then, common inter-relational inference through synthetic genealogies. Based on these experimental results, we propose new research directions to improve on existing models, including ComplEx. 1Code is available at: https://github.com/ttrouill/complex UNIVERSITE´ GRENOBLE ALPES R´esum´e Laboratoire d’Informatique de Grenoble Xerox Research Centre Europe Th`ese de Doctorat Mod`eles d’Embeddings `a Valeurs Complexes pour les Graphes de Connaissances by Th´eo Trouillon L’explosion de donn´ees relationnelles disponibles sous la forme de graphes de connais- sances a permis le d´eveloppement de multiples applications, dont les agents person- nels automatis´es, les syst`emes de recommandation et l’am´elioration des r´esultats de recherche en ligne. La grande taille et l’incompl´etude de ces bases de donn´ees n´ecessite le d´eveloppement de m´ethodes de compl´etion automatiques pour rendre ces applications viables. La compl´etion de graphes de connaissances, aussi appel´ee pr´ediction de liens, se doit de comprendre automatiquement la structure de larges graphes de connaissances (graphes dirig´es labellis´es) pour pr´edire les entr´ees manquantes (les arˆetes labellis´ees). Une approche populaire consiste `a repr´esenter un graphe de connaissances comme un tenseur d’ordre 3, et `a utiliser des m´ethodes de d´ecomposition de tenseur pour pr´edire leurs entr´ees manquantes. Les mod`eles de factorisation existants proposent diff´erents compromis entre leur expres- sivit´e, leur complexit´e en temps et en espace, et leur capacit´es de g´en´eralisation. Nous proposons un nouveau mod`ele appel´e ComplEx, pour “Complex Embeddings”, pour r´econcilier expressivit´e, complexit´e et g´en´eralisation par l’utilisation d’une factorisation en nombre complexes. Nous corroborons notre approche th´eoriquement en montrant que tous les graphes de connaissances possibles peuvent ˆetre exactement d´ecompos´es par le mod`ele propos´e. Notre approche, bas´ee sur des embeddings complexes reste simple, car n’impliquant qu’un produit trilin´eaire complexe, la` ou` d’autres m´ethodes recourent a` des fonctions de composition de plus en plus sophistiqu´ees pour accroˆıtre leur expressivit´e. Le mod`ele propos´e ayant une complexit´e lin´eaire en temps et en espace est passable `a l’´echelle, tout en d´epassant les scores de pr´ediction des approches existantes sur les jeux de donn´ees de r´ef´erence pour la pr´ediction de liens.2 Nous d´emontrons aussi la capacit´e de ComplEx a` apprendre des repr´esentations vectorielles utiles pour d’autres taˆches, en enrichissant des embeddings de mots, qui am´eliorent les pr´edictions sur le probl`eme de reconnaissance d’implication entre paires de phrases. Dans la derni`ere partie de cette th`ese, nous explorons les capacit´es des mod`eles de factorisation `a apprendre les structures relationnelles `a partir d’observations. De part leur nature vectorielle, il est non seulement difficile de comprendre pourquoi cette classe de mod`eles fonctionne aussi bien, mais aussi ou` ils ´echouent et comment ils peuvent ˆetre am´elior´es. Nous conduisons une ´etude exp´erimentale de mod`eles de l’´etat de l’art, non pas simplement pour les comparer, mais pour comprendre leurs capacit´es d’induction. Pour ´evaluer les forces et faiblesses de chaque mod`ele, nous cr´eons d’abord des tˆaches simples repr´esentant des propri´et´es atomiques des propri´et´es des relations des graphes de connaissances ; puis des taˆches repr´esentant des inf´erences multi-relationnelles communes au travers de g´en´ealogies synth´etis´ees. A` partir de ces r´esultats exp´erimentaux, nous 2Le code est mis a` disposition: https://github.com/ttrouill/complex proposons de nouvelles directions de recherche pour am´eliorer les mod`eles existants, y compris ComplEx.

Description:
Stéphane Clinchant, Jean-Marc Andreoli, Julien Perez, Sofia Michel, and Diana .. In artificial intelligence, many tasks require what is called commonsense .. and the unified medical language system (UMLS) data set [McCray, 2003] that links .. In the neural tensor network (NTN) model, Socher et al.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.