Table Of ContentIMPERIAL COLLEGE LONDON
DEPARTMENT OF COMPUTING
PROBXHAIL:
An Abductive-Inductive Algorithm
for Probabilistic Inductive Logic
Programming
Author: Supervision:
Stanislav DRAGIEV Dr. Alessandra RUSSO
Dr. Krysia BRODA
Mark LAW
June13,2016
iii
Abstract
This project describes the development of a machine learning model
that integrates Bayesian statistics in logic-based learning. This results in a
modelthatiseasytocomprehend,abenefitoffirst-orderlogictheories,and
isabletoexpressuncertaintylikestatisticalmodels. Wedothisbydefining
aprogramminglanguageforprobabilisticlogicprogramming,andthende-
velopingalgorithmsthatlearnthestatisticalparametersoftheprogramand
the structure of its first-order logic theory. The probabilistic programming
languagehascertaininterestingpropertieswhencomparedtostate-of-the-
art probabilistic logic programs, including the use of stable set semantics
and Bayesian priors. In addition, the algorithm introduces the concept of
abductive-inductivelearninginprobabilisticinductivelogicprogramming
(PILP),i.e. learningfirst-ordertheoriesbycombiningabductionandinduc-
tion.
First,wedefineannotatedliteralprograms,atypeofprobabilisticlogic
programinwhicheveryliteralisannotatedwithaparameterforaBernoulli
or Beta distribution. The associated probabilistic model, which is defined
byagenerativestoryforanormallogicprogram,usesindependentBernoulli
trials to determine whether each literal should be included in the theory.
We outline parameter learning algorithms for the program based on sta-
tistical abduction. We define PROBXHAIL (PROBabilistic eXtended Hy-
brid Abductive-Inductive Learning), an inductive-abductive algorithm for
structurallearningthatisageneralizationofXHAIL[26]. Thepropertiesof
annotated literal programs and the corresponding machine learning algo-
rithmsareevaluatedinlightofstate-of-the-artprobabilisticlogicprogram-
minglanguages.
v
Acknowledgements
IamverygratefulforthetimethatDr. AlessandraRussoandDr. Krysia
BrodadedicatedtomyprojectandfortheguidanceIwasgiventotakethe
project in an original direction. Mark Law also provided me with a lot
of insight on various topics in logic programming, and his knowledge of
howtooptimizeAnswerSetProgramswasparticularlyhelpfultome. Iam
alsoverygratefulforCalinRaresTurliuc’shelpinunderstandingstatistical
abductionandprobabilisticlogicprogramming. Last,butnotleast,Iwould
liketothankmyparentsfortheirsupportduringmystudies.
vii
Contents
Abstract iii
Acknowledgements v
1 Introduction 1
2 Background 5
2.1 ClauseTheorySemantics . . . . . . . . . . . . . . . . . . . . . 5
2.2 BDDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 InductiveLogicProgramming . . . . . . . . . . . . . . . . . . 7
2.3.1 InverseentailmentandBottomGeneralization . . . . 8
2.3.2 (Extended)HybridAbductiveInductiveLearning . . 9
2.4 ProbabilityTheory . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4.1 BayesTheorem . . . . . . . . . . . . . . . . . . . . . . 10
2.4.2 CategoricalDistribution . . . . . . . . . . . . . . . . . 11
2.4.3 DirichletDistribution . . . . . . . . . . . . . . . . . . 11
2.4.4 GenerativeProbabilisticModelsandPlatenotation . 11
2.4.5 LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.6 Expectation-Maximization. . . . . . . . . . . . . . . . 13
2.4.7 MarkovChainMonteCarlo(MCMC)Methods . . . . 14
2.5 ProbabilisticLogicPrograms . . . . . . . . . . . . . . . . . . 15
2.5.1 DistributionSemantics . . . . . . . . . . . . . . . . . . 15
2.5.2 ProbLog . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5.3 LogicProgramswithAnnotatedDisjunctionsandEM-
BLEM . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6 StatisticalAbduction . . . . . . . . . . . . . . . . . . . . . . . 18
2.6.1 PeircebayesandLatentDirichletAnalysis . . . . . . . 19
3 AnnotatedLiteralPrograms 21
3.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.1 UnprioredModel . . . . . . . . . . . . . . . . . . . . . 21
3.1.2 PrioredModel . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 FormalDefinition . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.1 UnprioredModel . . . . . . . . . . . . . . . . . . . . . 24
3.3.2 PrioredModel . . . . . . . . . . . . . . . . . . . . . . . 25
3.4 ComparisontoDistributionSemantics . . . . . . . . . . . . . 26
3.4.1 UnprioredModel . . . . . . . . . . . . . . . . . . . . . 26
TranslationfromProbLog . . . . . . . . . . . . . . . . 27
TranslationtoProbLog . . . . . . . . . . . . . . . . . . 28
3.4.2 Prioredmodel . . . . . . . . . . . . . . . . . . . . . . . 30
viii
4 ASPinputprogramsinPeircebayes 31
4.1 BraveabductioninASP . . . . . . . . . . . . . . . . . . . . . 31
4.2 AutomatictranslationofPeircebayesinputprograms . . . . 32
4.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3.1 LDAandCLDA. . . . . . . . . . . . . . . . . . . . . . 34
4.3.2 RIM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5 ParameterLearning 39
5.1 PartialInterpretations . . . . . . . . . . . . . . . . . . . . . . 39
5.2 PrioredModel . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.3 Unprioredmodel . . . . . . . . . . . . . . . . . . . . . . . . . 41
6 PILPwithPROBXHAIL 43
6.1 PILPLearningtask . . . . . . . . . . . . . . . . . . . . . . . . 43
6.1.1 Unprioredmodel . . . . . . . . . . . . . . . . . . . . . 43
6.1.2 Prioredmodel . . . . . . . . . . . . . . . . . . . . . . . 43
PointEstimate . . . . . . . . . . . . . . . . . . . . . . . 43
BayesianMethod . . . . . . . . . . . . . . . . . . . . . 43
6.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.2.1 Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.2.2 Description . . . . . . . . . . . . . . . . . . . . . . . . 44
6.2.3 StructureLearning . . . . . . . . . . . . . . . . . . . . 44
6.2.4 ParameterLearning . . . . . . . . . . . . . . . . . . . 46
6.3 Proofofcorrectness . . . . . . . . . . . . . . . . . . . . . . . . 46
6.3.1 Unprioredmodel . . . . . . . . . . . . . . . . . . . . . 46
6.3.2 Prioredmodel . . . . . . . . . . . . . . . . . . . . . . . 47
7 PILPEvaluation 49
7.1 Syntheticdataexperiment-GroundKernel . . . . . . . . . . 49
7.1.1 EMBLEM . . . . . . . . . . . . . . . . . . . . . . . . . 49
7.1.2 Peircebayes . . . . . . . . . . . . . . . . . . . . . . . . 50
LogLikelihoodandMeanSquareErroragainstprob-
abilities . . . . . . . . . . . . . . . . . . . . . 50
LogLikelihoodandMeanSquareErroragainstnum-
berofliterals . . . . . . . . . . . . . . . . . . 50
7.2 Syntheticdata-Ambiguousdataset . . . . . . . . . . . . . . 51
7.2.1 EMBLEM . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.2.2 Peircebayes . . . . . . . . . . . . . . . . . . . . . . . . 52
8 RelatedWork 53
8.1 ProbFOILandProbFOIL+ . . . . . . . . . . . . . . . . . . . . 53
8.1.1 PILPtask . . . . . . . . . . . . . . . . . . . . . . . . . . 53
8.1.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 53
8.2 SLIPCASEandSLIPCOVER . . . . . . . . . . . . . . . . . . . 54
8.2.1 PILPtask . . . . . . . . . . . . . . . . . . . . . . . . . . 54
8.2.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 54
8.3 SemCP-Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
ix
9 ConclusionandFutureWork 57
9.1 Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
9.2 PotentialExtensions . . . . . . . . . . . . . . . . . . . . . . . 57
9.2.1 Data-drivenmostspecifichypothesis . . . . . . . . . 57
9.2.2 Probabilisticmodelextension . . . . . . . . . . . . . . 58
9.2.3 Potentialapplications . . . . . . . . . . . . . . . . . . 59
9.3 ConcludingRemarks . . . . . . . . . . . . . . . . . . . . . . . 60
A Peircebayestaskimplementations 61
A.1 RIM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
A.1.1 Prologimplementation. . . . . . . . . . . . . . . . . . 61
A.1.2 ASPimplementation . . . . . . . . . . . . . . . . . . . 62
OuterQueryGrounding . . . . . . . . . . . . . . . . . 62
AbductiveTask . . . . . . . . . . . . . . . . . . . . . . 62
A.2 SeededLDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
A.2.1 Prolog . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
A.2.2 ASPrepresentation . . . . . . . . . . . . . . . . . . . . 64
OuterQueries . . . . . . . . . . . . . . . . . . . . . . . 64
InnerQueries . . . . . . . . . . . . . . . . . . . . . . . 64
A.3 FastLDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
A.3.1 OuterQuery . . . . . . . . . . . . . . . . . . . . . . . . 65
A.3.2 InnerQuery . . . . . . . . . . . . . . . . . . . . . . . . 65
Bibliography 67
Description:SLDNF. Selective Linear Definite clause resolution with Negation as Failure. XHAIL results in a complex architecture. A more elegant expressive power is comparable to probabilistic models with discrete ran- dom variables. For each person, the occurrence of each disease is represented by a bi-.