IMPERIAL COLLEGE LONDON DEPARTMENT OF COMPUTING PROBXHAIL: An Abductive-Inductive Algorithm for Probabilistic Inductive Logic Programming Author: Supervision: Stanislav DRAGIEV Dr. Alessandra RUSSO Dr. Krysia BRODA Mark LAW June13,2016 iii Abstract This project describes the development of a machine learning model that integrates Bayesian statistics in logic-based learning. This results in a modelthatiseasytocomprehend,abenefitoffirst-orderlogictheories,and isabletoexpressuncertaintylikestatisticalmodels. Wedothisbydefining aprogramminglanguageforprobabilisticlogicprogramming,andthende- velopingalgorithmsthatlearnthestatisticalparametersoftheprogramand the structure of its first-order logic theory. The probabilistic programming languagehascertaininterestingpropertieswhencomparedtostate-of-the- art probabilistic logic programs, including the use of stable set semantics and Bayesian priors. In addition, the algorithm introduces the concept of abductive-inductivelearninginprobabilisticinductivelogicprogramming (PILP),i.e. learningfirst-ordertheoriesbycombiningabductionandinduc- tion. First,wedefineannotatedliteralprograms,atypeofprobabilisticlogic programinwhicheveryliteralisannotatedwithaparameterforaBernoulli or Beta distribution. The associated probabilistic model, which is defined byagenerativestoryforanormallogicprogram,usesindependentBernoulli trials to determine whether each literal should be included in the theory. We outline parameter learning algorithms for the program based on sta- tistical abduction. We define PROBXHAIL (PROBabilistic eXtended Hy- brid Abductive-Inductive Learning), an inductive-abductive algorithm for structurallearningthatisageneralizationofXHAIL[26]. Thepropertiesof annotated literal programs and the corresponding machine learning algo- rithmsareevaluatedinlightofstate-of-the-artprobabilisticlogicprogram- minglanguages. v Acknowledgements IamverygratefulforthetimethatDr. AlessandraRussoandDr. Krysia BrodadedicatedtomyprojectandfortheguidanceIwasgiventotakethe project in an original direction. Mark Law also provided me with a lot of insight on various topics in logic programming, and his knowledge of howtooptimizeAnswerSetProgramswasparticularlyhelpfultome. Iam alsoverygratefulforCalinRaresTurliuc’shelpinunderstandingstatistical abductionandprobabilisticlogicprogramming. Last,butnotleast,Iwould liketothankmyparentsfortheirsupportduringmystudies. vii Contents Abstract iii Acknowledgements v 1 Introduction 1 2 Background 5 2.1 ClauseTheorySemantics . . . . . . . . . . . . . . . . . . . . . 5 2.2 BDDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 InductiveLogicProgramming . . . . . . . . . . . . . . . . . . 7 2.3.1 InverseentailmentandBottomGeneralization . . . . 8 2.3.2 (Extended)HybridAbductiveInductiveLearning . . 9 2.4 ProbabilityTheory . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4.1 BayesTheorem . . . . . . . . . . . . . . . . . . . . . . 10 2.4.2 CategoricalDistribution . . . . . . . . . . . . . . . . . 11 2.4.3 DirichletDistribution . . . . . . . . . . . . . . . . . . 11 2.4.4 GenerativeProbabilisticModelsandPlatenotation . 11 2.4.5 LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4.6 Expectation-Maximization. . . . . . . . . . . . . . . . 13 2.4.7 MarkovChainMonteCarlo(MCMC)Methods . . . . 14 2.5 ProbabilisticLogicPrograms . . . . . . . . . . . . . . . . . . 15 2.5.1 DistributionSemantics . . . . . . . . . . . . . . . . . . 15 2.5.2 ProbLog . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.5.3 LogicProgramswithAnnotatedDisjunctionsandEM- BLEM . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.6 StatisticalAbduction . . . . . . . . . . . . . . . . . . . . . . . 18 2.6.1 PeircebayesandLatentDirichletAnalysis . . . . . . . 19 3 AnnotatedLiteralPrograms 21 3.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.1.1 UnprioredModel . . . . . . . . . . . . . . . . . . . . . 21 3.1.2 PrioredModel . . . . . . . . . . . . . . . . . . . . . . . 22 3.2 FormalDefinition . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.3 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3.1 UnprioredModel . . . . . . . . . . . . . . . . . . . . . 24 3.3.2 PrioredModel . . . . . . . . . . . . . . . . . . . . . . . 25 3.4 ComparisontoDistributionSemantics . . . . . . . . . . . . . 26 3.4.1 UnprioredModel . . . . . . . . . . . . . . . . . . . . . 26 TranslationfromProbLog . . . . . . . . . . . . . . . . 27 TranslationtoProbLog . . . . . . . . . . . . . . . . . . 28 3.4.2 Prioredmodel . . . . . . . . . . . . . . . . . . . . . . . 30 viii 4 ASPinputprogramsinPeircebayes 31 4.1 BraveabductioninASP . . . . . . . . . . . . . . . . . . . . . 31 4.2 AutomatictranslationofPeircebayesinputprograms . . . . 32 4.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.3.1 LDAandCLDA. . . . . . . . . . . . . . . . . . . . . . 34 4.3.2 RIM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5 ParameterLearning 39 5.1 PartialInterpretations . . . . . . . . . . . . . . . . . . . . . . 39 5.2 PrioredModel . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.3 Unprioredmodel . . . . . . . . . . . . . . . . . . . . . . . . . 41 6 PILPwithPROBXHAIL 43 6.1 PILPLearningtask . . . . . . . . . . . . . . . . . . . . . . . . 43 6.1.1 Unprioredmodel . . . . . . . . . . . . . . . . . . . . . 43 6.1.2 Prioredmodel . . . . . . . . . . . . . . . . . . . . . . . 43 PointEstimate . . . . . . . . . . . . . . . . . . . . . . . 43 BayesianMethod . . . . . . . . . . . . . . . . . . . . . 43 6.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 6.2.1 Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 6.2.2 Description . . . . . . . . . . . . . . . . . . . . . . . . 44 6.2.3 StructureLearning . . . . . . . . . . . . . . . . . . . . 44 6.2.4 ParameterLearning . . . . . . . . . . . . . . . . . . . 46 6.3 Proofofcorrectness . . . . . . . . . . . . . . . . . . . . . . . . 46 6.3.1 Unprioredmodel . . . . . . . . . . . . . . . . . . . . . 46 6.3.2 Prioredmodel . . . . . . . . . . . . . . . . . . . . . . . 47 7 PILPEvaluation 49 7.1 Syntheticdataexperiment-GroundKernel . . . . . . . . . . 49 7.1.1 EMBLEM . . . . . . . . . . . . . . . . . . . . . . . . . 49 7.1.2 Peircebayes . . . . . . . . . . . . . . . . . . . . . . . . 50 LogLikelihoodandMeanSquareErroragainstprob- abilities . . . . . . . . . . . . . . . . . . . . . 50 LogLikelihoodandMeanSquareErroragainstnum- berofliterals . . . . . . . . . . . . . . . . . . 50 7.2 Syntheticdata-Ambiguousdataset . . . . . . . . . . . . . . 51 7.2.1 EMBLEM . . . . . . . . . . . . . . . . . . . . . . . . . 52 7.2.2 Peircebayes . . . . . . . . . . . . . . . . . . . . . . . . 52 8 RelatedWork 53 8.1 ProbFOILandProbFOIL+ . . . . . . . . . . . . . . . . . . . . 53 8.1.1 PILPtask . . . . . . . . . . . . . . . . . . . . . . . . . . 53 8.1.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 53 8.2 SLIPCASEandSLIPCOVER . . . . . . . . . . . . . . . . . . . 54 8.2.1 PILPtask . . . . . . . . . . . . . . . . . . . . . . . . . . 54 8.2.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 54 8.3 SemCP-Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 ix 9 ConclusionandFutureWork 57 9.1 Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 9.2 PotentialExtensions . . . . . . . . . . . . . . . . . . . . . . . 57 9.2.1 Data-drivenmostspecifichypothesis . . . . . . . . . 57 9.2.2 Probabilisticmodelextension . . . . . . . . . . . . . . 58 9.2.3 Potentialapplications . . . . . . . . . . . . . . . . . . 59 9.3 ConcludingRemarks . . . . . . . . . . . . . . . . . . . . . . . 60 A Peircebayestaskimplementations 61 A.1 RIM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 A.1.1 Prologimplementation. . . . . . . . . . . . . . . . . . 61 A.1.2 ASPimplementation . . . . . . . . . . . . . . . . . . . 62 OuterQueryGrounding . . . . . . . . . . . . . . . . . 62 AbductiveTask . . . . . . . . . . . . . . . . . . . . . . 62 A.2 SeededLDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 A.2.1 Prolog . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 A.2.2 ASPrepresentation . . . . . . . . . . . . . . . . . . . . 64 OuterQueries . . . . . . . . . . . . . . . . . . . . . . . 64 InnerQueries . . . . . . . . . . . . . . . . . . . . . . . 64 A.3 FastLDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 A.3.1 OuterQuery . . . . . . . . . . . . . . . . . . . . . . . . 65 A.3.2 InnerQuery . . . . . . . . . . . . . . . . . . . . . . . . 65 Bibliography 67
Description: