ebook img

Representation learning for relational data PDF

143 Pages·2017·9.7 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Representation learning for relational data

Representation learning for relational data Ludovic dos Santos To cite this version: LudovicdosSantos. Representationlearningforrelationaldata. MachineLearning[cs.LG].Université Pierre et Marie Curie - Paris VI, 2017. English. ￿NNT: 2017PA066480￿. ￿tel-01803188￿ HAL Id: tel-01803188 https://theses.hal.science/tel-01803188 Submitted on 30 May 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. UNIVERSITÉ PIERRE ET MARIE CURIE ÉcoleDoctoraleInformatiqueTélécommunicationsetÉlectronique(Paris) Laboratoired’InformatiquedeParis6 DOCTORAL THESIS Representation Learning for Relational Data Supervisor: Author: Patrick GALLINARI Ludovic DOS SANTOS Co-Supervisor: Benjamin PIWOWARSKI Athesissubmittedinfulfillmentoftherequirements forthedegreeofDoctorofComputerScience JuryMembers: Mr. Thierry ARTIÈRES Reporter Mr. Younes BENNANI Examiner Mr. Patrick GALLINARI Supervisor Mr. Rémi GILLERON Reporter Mrs. Marie-Jeanne LESOT Examiner Mr. Benjamin PIWOWARSKI Co-Supervisor Wednesday13th December,2017 iii Abstract The increasing use of social and sensor networks generates a large quantity of data that canberepresentedascomplexgraphs. Therearemanytasksfrominformationanalysis,to predictionandretrievalonecanimagineonthosedatawhererelationbetweengraphnodes shouldbeinformative. Inthisthesis,weproposeddifferentmodelsforthreedifferenttasks: • Graphnodeclassification • Relationaltimeseriesforecasting • Collaborativefiltering All the proposed models use the representation learning framework in its deterministic or Gaussianvariant. First, we proposed two algorithms for the heterogeneous graph labeling task, one us- ing deterministic representations and the other one Gaussian representations. Contrary to other state of the art models, our solution is able to learn edge weights when learning si- multaneouslytherepresentationsandtheclassifiers. Second, we proposed an algorithm for relational time series forecasting where the ob- servationsarenotonlycorrelatedinsideeachseries,butalsoacrossthedifferentseries. We useGaussianrepresentationsinthiscontribution. Thiswasanopportunitytoseeinwhich wayusingGaussianrepresentationsinsteadofdeterministiconeswasprofitable. At last, we apply the Gaussian representation learning approach to the collaborative filteringtask. ThisisapreliminaryworktoseeifthepropertiesofGaussianrepresentations found on the two previous tasks were also verified for the ranking one. The goal of this work was to then generalize the approach to more relational data and not only bipartite graphsbetweenusersanditems. v Contents ListofFigures 1 ListofTables 3 1 Introduction 5 1.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1.1 Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.1.2 TypesofGraphs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2 TasksStudiedduringthethesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.2.1 TheClassificationTask . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.2.2 TheForecastingTask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.2.3 TheRankingTask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3 LearningRepresentations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4.1 LearningDeterministicRepresentationsforHeterogeneousGraphNode Classification(Chapter4) . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.4.2 LearningGaussianRepresentations . . . . . . . . . . . . . . . . . . . . 11 Learning Gaussian Representations for Heterogeneous Graph Node Classification(Chapter5) . . . . . . . . . . . . . . . . . . . . . 11 Learning Gaussian Representations for Relational Time Series Fore- casting(Chapter6). . . . . . . . . . . . . . . . . . . . . . . . . 12 LearningGaussianRepresentationsforRanking(Chapter7) . . . . . . 12 1.5 ThesisOrganization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2 LearningRepresentationsforRelationalData 15 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2 LearningRepresentationsinGraphs . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2.1 InductiveandTransductiveAlgorithms . . . . . . . . . . . . . . . . . . 17 Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Transduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2.2 Supervised,UnsupervisedandSemi-SupervisedLearning . . . . . . . 18 UnsupervisedLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 SupervisedLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Semi-SupervisedLearning. . . . . . . . . . . . . . . . . . . . . . . . . . 19 OurContributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3 LearningDeterministicRepresentations . . . . . . . . . . . . . . . . . . . . . . 19 2.3.1 UnsupervisedModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 LearningfromContext . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 vi 2.3.2 SupervisedandSemi-SupervisedModels . . . . . . . . . . . . . . . . . 22 LearningNodesandRelationshipRepresentationsinKnowledgeBases 22 FromUnlabeledtoLabeledDatainClassification . . . . . . . . . . . . 24 2.3.3 OtherApplicationsofRepresentationLearning . . . . . . . . . . . . . 25 2.4 Capturinguncertaintiesinrepresentations . . . . . . . . . . . . . . . . . . . . 26 BayesianApproaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 DirectApproaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 I LearningDeterministicRepresentationsandApplicationtoClassification 33 3 StateoftheArt: GraphNodeClassification 37 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.2 GraphNodeClassification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.2.1 SimpleTransductiveModelforGraphNodeClassification . . . . . . . 39 3.2.2 OtherGraphNodeClassificationWorks . . . . . . . . . . . . . . . . . . 39 3.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4 LearningDeterministicRepresentationsforClassification 43 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.2 LearningRepresentationforGraphNodeClassification . . . . . . . . . . . . . 45 4.2.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 LossFunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Transductivegraphmodel . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.2.2 PriorParametersandLearningAlgorithms . . . . . . . . . . . . . . . . 47 PriorParameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 LearnedRelationSpecificParameters . . . . . . . . . . . . . . . . . . . 47 4.2.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.2.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 ConstructionoftheLastFMDatasets . . . . . . . . . . . . . . . . . . . . 53 EvidenceForHeterogeneousNodeReciprocalInfluence . . . . . . . . 53 ComparisonWithOtherModels . . . . . . . . . . . . . . . . . . . . . . 58 EvaluationMeasuresAndProtocol . . . . . . . . . . . . . . . . . . . . . 59 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 ImportanceOfTheRelations’Weights . . . . . . . . . . . . . . . . . . . 65 LabelcorrelationontheLastFM2Dataset . . . . . . . . . . . . . . . . . 67 4.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 II LearningGaussianRepresentationsandApplications 71 5 LearningGaussianRepresentationsforClassification 75 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.2 GraphNodeClassificationwithGaussianEmbeddings . . . . . . . . . . . . . 76 vii 5.2.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 LossFunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 GraphEmbedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 PriorParametersandLearnedRelationSpecificParameters . . . . . . 79 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.2.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 QualitativeDiscussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 6 LearningGaussianRepresentationsforRelationalTimeSeriesForecasting 87 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 6.1.1 RelationalTimeSeries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 6.1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.2 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 6.3 RelationalTimeSeriesForecastingwithGaussianEmbeddings . . . . . . . . . 91 6.3.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 NotationsandTaskFormalDescription . . . . . . . . . . . . . . . . . . 91 6.3.2 Informaldescription . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 ModelDefinition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 ImpactofminimizingtheKL-divergenceonpredictedvalues . . . . . 95 InferenceandTimeComplexity . . . . . . . . . . . . . . . . . . . . . . . 96 Variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 6.3.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Experimentalprotocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 6.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 7 LearningGaussianRepresentationsforCollaborativeFiltering 103 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 7.1.1 RecommenderSystemsandUncertainty . . . . . . . . . . . . . . . . . 103 7.1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 7.2 LearningGaussianEmbeddingsforCollaborativeFiltering . . . . . . . . . . . 105 7.2.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 TheGaussianEmbeddingsRankingModel . . . . . . . . . . . . . . . . 106 Lossfunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Orderingitems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 7.2.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 7.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 viii III Conclusion 119 8 Conclusionetperspectives 121 8.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 8.1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 GraphNodeClassification. . . . . . . . . . . . . . . . . . . . . . . . . . 121 RelationalTimeSeriesForecasting . . . . . . . . . . . . . . . . . . . . . 122 CollaborativeFiltering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 8.1.2 LearningHyperparameters . . . . . . . . . . . . . . . . . . . . . . . . . 122 8.1.3 FromDeterministictoGaussianRepresentations . . . . . . . . . . . . . 122 8.2 Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 8.2.1 ClassificationTask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 8.2.2 ForecastingTask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 8.2.3 RankingTask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 8.2.4 LearningGaussianEmbeddingsforKnowledgeBases . . . . . . . . . 124 Bibliography 125 1 List of Figures 1.1 Exampleofaheterogeneousmulti-labelgraphwithauthors,comments,pho- tos, videos, etc. link together by different types of relations such as friend- ship,colleagues,videosimilarity,etc. . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2 Atree-likestructurethatshowsthedependenciesbetweenthedifferentchap- tersofthethesismanuscript. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1 Drawing of representation learning for relational data where nodes are pro- jected from the initial graph (left) to the common latent space (right) where classifiersarelearned. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 ExampleofalowD (Z ||Z ).. . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 KL j i 2.3 Learned Gaussian representations with diagonal variances, from (Vilnis et al.,2015). Thefirstletterofeachwordindicatingthepositionofitsmean. . . 29 2.4 Learned two dimensional Gaussian representations of nodes on the Cora datasetfrom(Bojchevskietal.,2017). Colorindicatestheclasslabel. . . . . . . 30 3.1 RepresentationoftheLastFMnetworkwithusers,tracks,albumsandartists. Thedifferentrelationsarerepresentedwithdifferenttypesoflines. . . . . . . 38 4.1 Plots illustrating the inter-dependencies between labels of two node types for the LastFM2 dataset. Each plot shows P(Y |X ) for a specific variable t1 t2 couple(Y ,X ). ThevaluesX andY havebeenreorderedforclarity(see t1 t2 t2 t1 text). The x axis corresponding to variable X is on the bottom left of each t2 plot,andthey axiscorrespondingtotheY isonthebottomright. . . . . . . 55 t1 4.2 Plotsillustratingtheinter-dependenciesbetweenlabelsontheDBLPcorpus. A gray square means that there is no relation between the corresponding nodetypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.3 Plotsillustratingtheinter-dependenciesbetweenlabelsontheFlickRcorpus 56 4.4 Plotsillustratingtheinter-dependenciesbetweenlabelsontheLastFM2corpus 57 4.5 PlotoftheaverageofthecumulativesumofconditionalprobabilitiesP(Y |X ) t1 t2 forallrelationtypesontheLastFM2dataset. TheP(Y |X )sareindecreas- t1 t2 ingorderonthexaxis. x = 40%meansthat40%oftheconditionalprobabil- ity values have been considered and the corresponding cumulative value is plottedonthey axis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.6 PlotsoftheaverageofthecumulativesumofconditionalprobabilitiesP(Y |X ) t1 t2 (withcumulativepercentagesonthex-axis)forallrelationtypesonalldatasets. 59 4.7 Evolution of w values over training steps for all relations r involving users r forLastFM2. Atconvergencew = 0.22,w = 0.28,w = user→user user→track user→album 0.25,w = 0.25. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 user→artist

Description:
prediction and retrieval one can imagine on those data where relation obtain information about the uncertainty of that representation. web data mining or biology. Gu, Quanquan, Charu Aggarwal, Jialu Liu, and Jiawei Han (2013) ference on Information and Knowledge Management. ACM
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.