Probabilistic Horn abduction and Bayesian networks (cid:0) David Poole Department of Computer Science(cid:0) University of British Columbia(cid:0) Vancouver(cid:0) B(cid:1)C(cid:1)(cid:0) Canada V(cid:2)T (cid:3)Z(cid:4) poole(cid:5)cs(cid:1)ubc(cid:1)ca telephone(cid:6) (cid:7)(cid:2)(cid:8)(cid:4)(cid:9) (cid:10)(cid:11)(cid:11) (cid:2)(cid:11)(cid:12)(cid:4) fax(cid:6) (cid:7)(cid:2)(cid:8)(cid:4)(cid:9) (cid:10)(cid:11)(cid:11) (cid:12)(cid:4)(cid:10)(cid:12) September (cid:3)(cid:12)(cid:0) (cid:3)(cid:13)(cid:13)(cid:14) Abstract This paper presents a simple framework for Horn(cid:0)clause abduc(cid:0) tion(cid:1) with probabilities associated with hypotheses(cid:2) The framework incorporates assumptions about the rule base and independence as(cid:0) sumptions amongst hypotheses(cid:2) It is shown how any probabilistic knowledge representable in a discrete Bayesian belief network can be represented in this framework(cid:2) The main contribution is in (cid:3)nding a relationship between logical and probabilistic notions of evidential reasoning(cid:2) This provides a useful representation language in its own right(cid:1) providing a compromise between heuristic and epistemic ad(cid:0) equacy(cid:2) It also shows how Bayesian networks can be extended be(cid:0) yond a propositional language(cid:2) This paper also showshow a language withonly(cid:4)unconditionally(cid:5) independent hypothesescanrepresentany probabilistic knowledge(cid:1) and arguesthat it is better to invent new hy(cid:0) potheses to explain dependence rather than having to worry about dependence in the language(cid:2) (cid:0) Scholar(cid:0) Canadian Institute for Advanced Research(cid:1) (cid:0) Probabilistic Horn abduction and Bayesian networks (cid:1) (cid:0) Introduction Probabilistic Horn Abduction (cid:2)(cid:3)(cid:4)(cid:5) (cid:3)(cid:6)(cid:7) is a framework for logic(cid:8)based abduc(cid:8) tion that incorporates probabilities with assumptions(cid:9) This is being used as a framework for diagnosis (cid:2)(cid:3)(cid:4)(cid:7) that incorporates both pure Prolog (cid:2)(cid:10)(cid:11)(cid:7) and Bayesian Networks (cid:2)(cid:10)(cid:12)(cid:7) as special cases(cid:9) This paper expands on (cid:2)(cid:3)(cid:4)(cid:5) (cid:3)(cid:6)(cid:7) and develops the formal underpinnings of probabilistic Horn abduction(cid:5) shows the strong relationships to other formalisms and argues that it is a good representation language in its own right(cid:9) It can be motivated in a number of di(cid:13)erent ways(cid:14) Determiningwhatisinasystemfromobservations(cid:15)diagnosisandrecogni(cid:8) tion(cid:16) isan importantpart of AI(cid:9)There havebeen manylogic(cid:8)based proposals as to what is a diagnosis (cid:2)(cid:0)(cid:6)(cid:5) (cid:17)(cid:6)(cid:5) (cid:0)(cid:10)(cid:5) (cid:3)(cid:17)(cid:5) (cid:0)(cid:1)(cid:7)(cid:9) One problem with all of these proposals is that for any diagnostic problem of a reasonable size there are far too many logical possibilities to handle(cid:9) For example(cid:5) when considering fault models(cid:2)(cid:0)(cid:3)(cid:5) (cid:3)(cid:17)(cid:7)(cid:5) there is almost always an exponential numberof logical possibilities (cid:15)e(cid:9)g(cid:9)(cid:5) each component could be in one of its normal states or in the unknown state(cid:16)(cid:9) For practical problems(cid:5) many of the logically possible diagnoses are so unlikely that it is not worth considering them(cid:9) There is a problem(cid:5) however(cid:5) in removing the unlikely possibilities a priori (cid:15)those with a low prior probability(cid:16)(cid:14) it may happen that the unlikely occurrence is the actual truth in the world(cid:9) Analysis of the combinatorial explosions would however tend to suggest that we need to take into account probabilities of the diagnoses (cid:2)(cid:0)(cid:10)(cid:5) (cid:3)(cid:11)(cid:5) (cid:10)(cid:3)(cid:7)(cid:5) and not even generate the unlikely diagnoses (cid:15)i(cid:9)e(cid:9)(cid:5) those with a low posterior probability(cid:16)(cid:9) In a di(cid:13)erent strand of research(cid:5) Bayesian networks (cid:2)(cid:10)(cid:12)(cid:7)(cid:5) have proven to be a good representation for the sort of probabilistic independence found in many domains(cid:9) While the independence of Bayesian networks has been expressed in logic (cid:15)e(cid:9)g(cid:9)(cid:5) (cid:2)(cid:10)(cid:7)(cid:16)(cid:5) there has not been a mapping between logical speci(cid:18)cations of knowledge and Bayesian network representations(cid:5) where the logic is not at the meta(cid:8)level to the probabilistic knowledge(cid:9) This paper describes what could be termed as a logic of discrete Bayesian Networks(cid:5) where the logic expresses the object levelknowledge and the independence of Bayesian networks is an emergent property of the representation(cid:9) In the un(cid:8) certaintycommunitythere has also been a need to extend Bayesiannetworks to beyond a propositional language (cid:2)(cid:17)(cid:5) (cid:19)(cid:5) (cid:1)(cid:1)(cid:7)(cid:9) Probabilistic Horn abduction is naturally non(cid:8)propositional(cid:5) and provides a natural extension of Bayesian Probabilistic Horn abduction and Bayesian networks (cid:10) networks to a non(cid:8)propositional language(cid:9) The work presented in this paper should be contrasted with other at(cid:8) tempts to combine logic and probability in very powerful languages (cid:15)e(cid:9)g(cid:9)(cid:5) (cid:2)(cid:10)(cid:7)(cid:16)(cid:9) We are trying to (cid:18)nd the simplest language that is useful for our pur(cid:8) poses(cid:5) rather than combinemanydi(cid:13)erent features onto one framework(cid:9) Our goal in this research is to investigate a simple yet powerful logic(cid:9) The representation proposed in this paper is interesting in its own right as a compromise between epistemic and heuristic adequacy (cid:2)(cid:10)(cid:10)(cid:7)(cid:9) It extends pure Prolog in a simple way to include probabilities(cid:9) While all of the hy(cid:8) potheses are independent(cid:5) by inventing new hypotheses(cid:5) we can represent any probabilistic dependency(cid:9) This simplicity allows us to experiment with a minimalist representation and only extend it when we need to(cid:9) It is inter(cid:8) esting to see how far we can go with a very simple representation language(cid:5) only adding to it when it fails to do what we it want to do(cid:9) (cid:0)(cid:1)(cid:0) A Motivating Example Before we present the language and the assumptions behind the representa(cid:8) tion(cid:5) we (cid:18)rst give an example to show what sorts of things we can represent(cid:9) The example is based on the three cascaded inverters of (cid:2)(cid:0)(cid:3)(cid:7)(cid:9) Figure (cid:0) shows the connections of the inverters(cid:9) in(i1) out(i3) i1 i2 i3 Figure (cid:0)(cid:14) Three cascaded inverters(cid:9) The language is an extension of pure Prolog(cid:9) We can write Prolog(cid:8)like de(cid:18)nite clauses to represent the general knowledge of the domain(cid:5) and the speci(cid:18)c instances known about the particular con(cid:18)guration(cid:14) val(cid:15)out(cid:15)G(cid:16)(cid:0)on(cid:0)T(cid:16)(cid:0) ok(cid:15)G(cid:16)(cid:1)val(cid:15)in(cid:15)G(cid:16)(cid:0)o(cid:0)(cid:0)T(cid:16)(cid:1) Probabilistic Horn abduction and Bayesian networks (cid:3) val(cid:15)out(cid:15)G(cid:16)(cid:0)o(cid:0)(cid:0)T(cid:16) (cid:0) ok(cid:15)G(cid:16)(cid:1)val(cid:15)in(cid:15)G(cid:16)(cid:0)on(cid:0)T(cid:16)(cid:1) val(cid:15)out(cid:15)G(cid:16)(cid:0)V(cid:0)T(cid:16)(cid:0) shorted(cid:15)G(cid:16)(cid:1)val(cid:15)in(cid:15)G(cid:16)(cid:0)V(cid:0)T(cid:16)(cid:1) val(cid:15)out(cid:15)G(cid:16)(cid:0)o(cid:0)(cid:0)T(cid:16) (cid:0) blown(cid:15)G(cid:16)(cid:1) val(cid:15)in(cid:15)G(cid:16)(cid:0)V(cid:0)T(cid:16)(cid:0) conn(cid:15)G(cid:0)(cid:0)G(cid:16) (cid:1)val(cid:15)out(cid:15)G(cid:0)(cid:16)(cid:0)V(cid:0)T(cid:16)(cid:1) conn(cid:15)i(cid:0)(cid:0)i(cid:1)(cid:16)(cid:1) conn(cid:15)i(cid:1)(cid:0)i(cid:10)(cid:16)(cid:1) Here val(cid:15)P(cid:0)V(cid:0)T(cid:16) means that port P has value V at time T(cid:20) in(cid:15)G(cid:16) is the input port of gate G and out(cid:15)G(cid:16) is the output port of gate G(cid:9) ok(cid:15)G(cid:16) means G is working properly(cid:20) shorted(cid:15)G(cid:16) means that G is shorted and acts as a wire(cid:20) and blown(cid:15)G(cid:16) means that G always outputs the value o(cid:0)(cid:9) Thelanguage alsohas a(cid:21)disjointdeclaration(cid:22)that de(cid:18)nesa setofdisjoint and covering hypotheses that have probabilities associated with them(cid:14) disjoint(cid:15)(cid:2)ok(cid:15)G(cid:16) (cid:14) (cid:11)(cid:1)(cid:12)(cid:17)(cid:0)shorted(cid:15)G(cid:16) (cid:14) (cid:11)(cid:1)(cid:11)(cid:10)(cid:0)blown(cid:15)G(cid:16) (cid:14) (cid:11)(cid:1)(cid:11)(cid:1)(cid:7)(cid:16)(cid:1) disjoint(cid:15)(cid:2)val(cid:15)in(cid:15)i(cid:0)(cid:16)(cid:0)on(cid:0)T(cid:16)(cid:14) (cid:11)(cid:1)(cid:17)(cid:0)val(cid:15)in(cid:15)i(cid:0)(cid:16)(cid:0)o(cid:0)(cid:0)T(cid:16) (cid:14) (cid:11)(cid:1)(cid:17)(cid:7)(cid:16)(cid:1) The (cid:18)rst gives the prior probabilities of the states of gates(cid:9) The second gives the prior probabilities of the inputs to the (cid:18)rst gate(cid:9) In our language(cid:5) di(cid:13)er(cid:8) ent instances of disjoint declarations are probabilistically (cid:15)unconditionally(cid:16) independent(cid:5) and so we have stated that the gates break independently(cid:5) and that the values of the input to gate i(cid:0) is independent of the states of the system(cid:9) The sorts of things that we can ask(cid:5) and for which we have given enough information to compute include(cid:14) (cid:2) What is the probability that gate i(cid:1) is ok given that the input to i(cid:0) is o(cid:0) and the output of i(cid:10) is o(cid:0) at time t(cid:0)(cid:23) P(cid:15)ok(cid:15)i(cid:1)(cid:16)jval(cid:15)in(cid:15)i(cid:0)(cid:16)(cid:0)o(cid:0)(cid:0)t(cid:0)(cid:16)(cid:1)val(cid:15)out(cid:15)i(cid:10)(cid:16)(cid:0)o(cid:0)(cid:0)t(cid:0)(cid:16)(cid:16) (cid:15)The answer is (cid:11)(cid:9)(cid:6)(cid:19)(cid:16)(cid:9) (cid:2) If the input of i(cid:0) were on what is the probability that the output of i(cid:10) will be o(cid:0)(cid:23) P(cid:15)val(cid:15)out(cid:15)i(cid:10)(cid:16)(cid:0)o(cid:0)(cid:0)t(cid:0)(cid:16)jval(cid:15)in(cid:15)i(cid:0)(cid:16)(cid:0)on(cid:0)t(cid:0)(cid:16)(cid:16) (cid:15)The answer is (cid:11)(cid:9)(cid:4)(cid:12)(cid:12)(cid:16)(cid:9) Probabilistic Horn abduction and Bayesian networks (cid:17) (cid:2) What was the probability that the input to i(cid:0) was on at time t(cid:1) given that the output to i(cid:1) was o(cid:13) at time t(cid:1) and given that the output of i(cid:10) was o(cid:13) and the input of i(cid:0) was o(cid:13) at time t(cid:0)(cid:23) P(cid:15)val(cid:15)in(cid:15)i(cid:0)(cid:16)(cid:0)on(cid:0)t(cid:1)(cid:16)jval(cid:15)out(cid:15)i(cid:1)(cid:16)(cid:0)o(cid:0)(cid:0)t(cid:1)(cid:16)(cid:1)val(cid:15)out(cid:15)i(cid:10)(cid:16)(cid:0)o(cid:0)(cid:0)t(cid:0)(cid:16)(cid:1)val(cid:15)in(cid:15)i(cid:0)(cid:16)(cid:0)o(cid:0)(cid:0)t(cid:0)(cid:16)(cid:16) (cid:15)the answer is (cid:11)(cid:9)(cid:17)(cid:17)(cid:16)(cid:9) Each of these answers is computedin termsof explanations(cid:5) namelyargu(cid:8) mentswith premisesfrom the possible hypotheses(cid:9) We assume independence amongst the hypotheses so that the prior probability of an explanation is ob(cid:8) tained by multiplyingthe probabilities of the hypotheses in the explanation(cid:9) For example(cid:5) the explanations of the observation val(cid:15)in(cid:15)i(cid:0)(cid:16)(cid:0)o(cid:0)(cid:0)t(cid:0)(cid:16) (cid:1) val(cid:15)out(cid:15)i(cid:10)(cid:16)(cid:0)o(cid:0)(cid:0)t(cid:0)(cid:16)(cid:5) together with their prior probability are(cid:14) Explanation(cid:14) (cid:2)val(cid:15)in(cid:15)i(cid:0)(cid:16)(cid:0)o(cid:0)(cid:0)t(cid:0)(cid:16)(cid:0)ok(cid:15)i(cid:10)(cid:16)(cid:0)ok(cid:15)i(cid:1)(cid:16)(cid:0)shorted(cid:15)i(cid:0)(cid:16)(cid:7) Prior (cid:24) (cid:11)(cid:9)(cid:11)(cid:0)(cid:10)(cid:17)(cid:3) Explanation(cid:14) (cid:2)val(cid:15)in(cid:15)i(cid:0)(cid:16)(cid:0)o(cid:0)(cid:0)t(cid:0)(cid:16)(cid:0)ok(cid:15)i(cid:10)(cid:16)(cid:0)shorted(cid:15)i(cid:1)(cid:16)(cid:0)ok(cid:15)i(cid:0)(cid:16)(cid:7) Prior (cid:24) (cid:11)(cid:9)(cid:11)(cid:0)(cid:10)(cid:17)(cid:3) Explanation(cid:14) (cid:2)val(cid:15)in(cid:15)i(cid:0)(cid:16)(cid:0)o(cid:0)(cid:0)t(cid:0)(cid:16)(cid:0)shorted(cid:15)i(cid:10)(cid:16)(cid:0)ok(cid:15)i(cid:1)(cid:16)(cid:0)ok(cid:15)i(cid:0)(cid:16)(cid:7) Prior (cid:24) (cid:11)(cid:9)(cid:11)(cid:0)(cid:10)(cid:17)(cid:3) Explanation(cid:14) (cid:2)val(cid:15)in(cid:15)i(cid:0)(cid:16)(cid:0)o(cid:0)(cid:0)t(cid:0)(cid:16)(cid:0)blown(cid:15)i(cid:10)(cid:16)(cid:7) Prior (cid:24) (cid:11)(cid:9)(cid:11)(cid:0)(cid:11)(cid:11)(cid:11) Explanation(cid:14) (cid:2)val(cid:15)in(cid:15)i(cid:0)(cid:16)(cid:0)o(cid:0)(cid:0)t(cid:0)(cid:16)(cid:0)ok(cid:15)i(cid:10)(cid:16)(cid:0)ok(cid:15)i(cid:1)(cid:16)(cid:0)blown(cid:15)i(cid:0)(cid:16)(cid:7) Prior (cid:24) (cid:11)(cid:9)(cid:11)(cid:11)(cid:12)(cid:11)(cid:1)(cid:17) Explanation(cid:14) (cid:2)val(cid:15)in(cid:15)i(cid:0)(cid:16)(cid:0)o(cid:0)(cid:0)t(cid:0)(cid:16)(cid:0)shorted(cid:15)i(cid:10)(cid:16)(cid:0)blown(cid:15)i(cid:1)(cid:16)(cid:7) Prior (cid:24) (cid:11)(cid:9)(cid:11)(cid:11)(cid:11)(cid:10)(cid:11)(cid:11)(cid:11) Explanation(cid:14) (cid:2)val(cid:15)in(cid:15)i(cid:0)(cid:16)(cid:0)o(cid:0)(cid:0)t(cid:0)(cid:16)(cid:0)shorted(cid:15)i(cid:10)(cid:16)(cid:0)shorted(cid:15)i(cid:1)(cid:16)(cid:0)shorted(cid:15)i(cid:0)(cid:16)(cid:7) Prior (cid:24) (cid:11)(cid:9)(cid:11)(cid:11)(cid:11)(cid:11)(cid:0)(cid:10)(cid:17)(cid:11) Explanation(cid:14) (cid:2)val(cid:15)in(cid:15)i(cid:0)(cid:16)(cid:0)o(cid:0)(cid:0)t(cid:0)(cid:16)(cid:0)shorted(cid:15)i(cid:10)(cid:16)(cid:0)shorted(cid:15)i(cid:1)(cid:16)(cid:0)blown(cid:15)i(cid:0)(cid:16)(cid:7) Prior (cid:24) (cid:11)(cid:9)(cid:11)(cid:11)(cid:11)(cid:11)(cid:11)(cid:12)(cid:11)(cid:11)(cid:11) Probabilistic Horn abduction and Bayesian networks (cid:19) By the way the knowledge base was constructed(cid:5) these explanations are disjoint and covering(cid:5) and so we can compute the prior probability of val(cid:15)in(cid:15)i(cid:0)(cid:16)(cid:0)o(cid:0)(cid:0)t(cid:0)(cid:16)(cid:1)val(cid:15)out(cid:15)i(cid:10)(cid:16)(cid:0)o(cid:0)(cid:0)t(cid:0)(cid:16)(cid:16) by summing the probabilities of these explanations(cid:5) which here is (cid:11)(cid:9)(cid:11)(cid:17)(cid:12)(cid:12)(cid:19)(cid:9) (cid:1) Probabilistic Horn Abduction In this section we develop the language(cid:5) the assumptions behind it and a (cid:21)semantics(cid:22) of the language in termsof abduction(cid:9) A more formal semantics is provided in Appendix A(cid:9) The language is designed to be usable and does not allow us to state what cannot be computedina straight forward manner(cid:9) The initial language is translated into an abductive framework with a number of assumptions about the knowledge base(cid:9) The appendix gives a more formal model(cid:8)theoretic semantics and demonstrates the equivalence between the two(cid:9) (cid:2)(cid:1)(cid:0) The Probabilistic Horn abduction language Our language uses the Prolog conventions (cid:2)(cid:19)(cid:3)(cid:7)(cid:14) De(cid:0)nition (cid:1)(cid:2)(cid:3) A termis eithera variable (cid:15)starting with an upper case let(cid:8) ter(cid:16)(cid:5)aconstant(cid:15)startingwithalowercaseletter(cid:16)orisoftheformf(cid:15)t(cid:0)(cid:0)(cid:3)(cid:3)(cid:3)(cid:0)tn(cid:16) where f is a function symbol (cid:15)starting with a lower case letter(cid:16) and each ti is a term(cid:9) An atomic symbol (cid:15)atom(cid:16) is of the form p or p(cid:15)t(cid:0)(cid:0)(cid:3)(cid:3)(cid:3)(cid:0)tn(cid:16) where p is a predicate symbol (cid:15)starting with a lower case letter(cid:16) and each ti is a term(cid:9) De(cid:0)nition (cid:1)(cid:2)(cid:1) A de(cid:0)nite clause is of the form(cid:14) a(cid:1) or a (cid:0) a(cid:0) (cid:1) (cid:1)(cid:1)(cid:1)(cid:1) an(cid:1) where a and each ai are atomic symbols(cid:9) De(cid:0)nition (cid:1)(cid:2)(cid:4) A disjoint declaration is of the form disjoint(cid:15)(cid:2)h(cid:0) (cid:14) p(cid:0)(cid:0)(cid:3)(cid:3)(cid:3)(cid:0)hn (cid:14) pn(cid:7)(cid:16)(cid:1) where the hi are atoms(cid:5) and the pi are real numbers (cid:11) (cid:4) pi (cid:4) (cid:0) such that p(cid:0) (cid:25) (cid:3)(cid:3)(cid:3) (cid:25) pn (cid:24) (cid:0)(cid:9) Any variable appearing in one hi must appear in all of the hj (cid:15)i(cid:9)e(cid:9)(cid:5) the hi share the same variables(cid:16)(cid:9) The hi will be referred to as hypotheses or assumables(cid:9) Probabilistic Horn abduction and Bayesian networks (cid:6) De(cid:0)nition (cid:1)(cid:2)(cid:5) A probabilistic Horn abduction theory (cid:15)which will be referred to as a (cid:21)theory(cid:22)(cid:16) is a collection of de(cid:18)nite clauses and disjoint dec(cid:8) larations such that if a ground atom h is an instance of a hypothesis in one disjoint declaration(cid:5) then it is not an instance of another hypothesis in any of the disjoint declarations(cid:9) Given theory T(cid:5) we de(cid:18)ne FT the facts(cid:5) is the set of de(cid:18)nite clauses in T together with the clauses of the form false (cid:0) hi (cid:1)hj where hi and hj both appear in the same disjoint declaration in T(cid:5) and (cid:0) i (cid:5)(cid:24) j(cid:9) Let FT be the set of ground instances of elements of FT(cid:9) HT the hypotheses(cid:5) the set of hi that appears in some disjoint declaration (cid:0) in T(cid:9) Let HT be the set of ground instances of elements of HT(cid:9) (cid:0) (cid:0) (cid:0) PT is a function HT (cid:6)(cid:7) (cid:2)(cid:11)(cid:0)(cid:0)(cid:7)(cid:9) P(cid:15)hi(cid:16) (cid:24) pi where hi is a ground instance of (cid:0) hypothesis hi(cid:5) and hi (cid:14) pi is in a disjoint declaration in T(cid:9) P(cid:15)hi(cid:16) will (cid:0) be the prior probability of hi(cid:9) Where T is understood from context(cid:5) we omit the subscript(cid:9) The disjoint declarations allow a very restricted form of integrity con(cid:8) straints (cid:2)(cid:1)(cid:17)(cid:7)(cid:9) It allow binary integrity constraints (cid:15)the conjunction of two hypotheses is false(cid:16) such that the ground instances of hypotheses formmutu(cid:8) ally exclusive and covering groupings that correspond to random variables(cid:9) A theory will de(cid:18)ne a set of represented atoms that are a subset of the atoms of T(cid:9) The represented atoms will often be not listed explicitly(cid:5) but will be left implicit(cid:15)they will typicallybe instances of hypotheses and heads of clauses(cid:16)(cid:9) The represented atoms are those about which the theory can answer questions(cid:9) Questions about atoms not in the represented atoms will be beyond the scope of the theory(cid:9) The theory is not expected to be able to answer queries outside of its scope(cid:9) (cid:2)(cid:1)(cid:2) Abduction We (cid:18)rst give the language an abductive characterisation(cid:5) using the normal de(cid:18)nition of the de(cid:18)nite clauses(cid:9) This is used to make explicit our assump(cid:8) tions and to build the theory in a natural manner(cid:9) In Appendix A(cid:5) we give Probabilistic Horn abduction and Bayesian networks (cid:4) a model theoretic characterisation that incorporates our assumptions(cid:5) and show the equivalence of the formulations(cid:9) The formulation of abduction used is a simpli(cid:18)ed form (cid:2)(cid:0)(cid:4)(cid:7) of Theorist (cid:2)(cid:17)(cid:0)(cid:5) (cid:3)(cid:10)(cid:7)(cid:9) It is simpli(cid:18)ed in being restricted to Horn clauses(cid:9) This can also be seen as a generalisation of an ATMS (cid:15)with prede(cid:18)ned nogoods(cid:16) (cid:2)(cid:17)(cid:12)(cid:7) to be (cid:0) non(cid:8)propositional (cid:9) An abductive scheme is a pair hF(cid:0)Hi where F is a set of Horn clauses(cid:9) Variables in F are implicitlyuniversally quanti(cid:8) (cid:0) (cid:18)ed(cid:9) Let F be the set of ground instances of elements of F(cid:9) H is a setof (cid:15)possibly open(cid:16) atoms(cid:5)calledthe(cid:21)assumables(cid:22) or the(cid:21)possible (cid:0) hypotheses(cid:22)(cid:9) Let H be the set of ground instances of elements of H(cid:9) De(cid:0)nition (cid:1)(cid:2)(cid:6) (cid:2)(cid:17)(cid:0)(cid:5) (cid:3)(cid:1)(cid:7) If g is a closed formula(cid:5) an explanation of g from (cid:0) hF(cid:0)Hi is a set D of elements of H such that (cid:2) F (cid:8)D j(cid:24) g and (cid:2) F (cid:8)D (cid:5)j(cid:24) false(cid:9) The (cid:18)rst condition says that D is su(cid:26)cient to imply g(cid:5) and the second says that D is possible(cid:9) De(cid:0)nition (cid:1)(cid:2)(cid:7) A minimal explanation of g is an explanation of g such that no strict subset is an explanation of g(cid:9) (cid:2)(cid:1)(cid:3) Assumptions about the rule base In order to be able to simply interpret our rules probabilistically we make some assumptions about the rules and some probabilistic independence as(cid:8) sumptions about the hypotheses(cid:9) The (cid:18)rst assumption is syntactic(cid:5) about the relationship between hy(cid:8) potheses and rules(cid:14) Assumption (cid:1)(cid:2)(cid:8) Thereareno rules inF whose head uni(cid:18)eswitha member of H(cid:9) (cid:0) A main di(cid:2)erence is in the philosophy of use(cid:1) We assume that the Horn clauses are representing the objectlevelknowledge(cid:0)rather than(cid:0)asinanATMS(cid:0)actingasabackend of a problemsolver (cid:3)(cid:4)(cid:4)(cid:5)(cid:1) Probabilistic Horn abduction and Bayesian networks (cid:12) This does not seem to be a very severe restriction(cid:5) in practice(cid:9) It says that we do not want rules to implya hypothesis(cid:9) Presumably(cid:5)if we had rules for h(cid:5) the only reason that we would want to makeh a hypothesis is if it was possible that h just happens to be true (cid:15)without any other (cid:21)cause(cid:22)(cid:16)(cid:9) In this case we can replace h in H by the hypothesis h happens to be true and add the rule h (cid:0) h happens to be true (cid:0) Assumption (cid:1)(cid:2)(cid:9) (cid:15)acyclicity (cid:2)(cid:0)(cid:7)(cid:16) If F is the set of ground instances of ele(cid:8) ments of F(cid:5) it is possible to assign a natural number to every ground atom (cid:0) such that for every rule in F the atoms in the body of the rule are strictly less than the atom in the head(cid:9) This assumption is described as natural by Apt and Bezem (cid:2)(cid:0)(cid:7)(cid:9) It is a generalisation of the hierarchicalconstraint of Clark (cid:2)(cid:4)(cid:7)(cid:9) It impliesthat there are no in(cid:18)nite chains when backchaining from any ground goal(cid:9) This does not restrictrecursion(cid:5) but does meanthat allrecursion mustbe wellfounded(cid:9) These assumptions are made implicitly in (cid:2)(cid:3)(cid:10)(cid:7)(cid:5) are explicit in (cid:2)(cid:12)(cid:7) (cid:15)who make the hierarchical constraint rather than the acyclic constraint(cid:16)(cid:5) but are relaxed in (cid:2)(cid:1)(cid:3)(cid:7)(cid:9) When using abduction we often assume that the explanations are cover(cid:8) ing(cid:9) This can be a valid assumption if we have anticipated all eventualities(cid:5) and the observations are within the domain of the expected observations (cid:15)usually if this assumption is violated there are no explanations(cid:16)(cid:9) This is also supported by recent attempts at a completion semantics for abduction (cid:2)(cid:3)(cid:10)(cid:5) (cid:12)(cid:5) (cid:1)(cid:3)(cid:7)(cid:9) The results show how abduction can be considered as deduction on the (cid:21)closure(cid:22) of the knowledge base that includes statements that the given causes are the only causes(cid:9) We make this assumption explicit here(cid:14) (cid:0) Assumption (cid:1)(cid:2)(cid:10) The rules in F for every ground non(cid:8)hypothesis repre(cid:8) sented atom are covering(cid:9) (cid:0) That is(cid:5) if the rules for a in F are a (cid:0) B(cid:0) a (cid:0) B(cid:1) (cid:9) (cid:9) (cid:9) a (cid:0) Bm Probabilistic Horn abduction and Bayesian networks (cid:0)(cid:11) if a is true(cid:5) one of the Bi is true(cid:9) The completion of a is a (cid:9) B(cid:0) (cid:10)(cid:3)(cid:3)(cid:3)(cid:10)Bn Thus the covering assumption (cid:1)(cid:9)(cid:12) says that Clark(cid:27)s completion (cid:2)(cid:4)(cid:7) is valid for every non(cid:8)assumable(cid:9) If the rules for a are not covering(cid:5) we create a new hypothesis a is true for some other reason and add the rule a (cid:0) a is true for some other reason(cid:1) Lemma (cid:1)(cid:2)(cid:3)(cid:11) (cid:2)(cid:12)(cid:7) Under assumptions (cid:1)(cid:2)(cid:3)(cid:4) (cid:1)(cid:2)(cid:5) and (cid:1)(cid:2)(cid:6)(cid:4) if expl(cid:15)g(cid:0)T(cid:16) is the (cid:0) set of all minimal explanations of g from hFT(cid:0)HTi(cid:4) and comp(cid:15)T(cid:16) is FT aug(cid:7) mented with the completion of every ground instance of every non(cid:7)assumable (cid:8)including Clark(cid:9)s equational theory (cid:10)(cid:5)(cid:11)(cid:12)(cid:4) then (cid:0) (cid:3) comp(cid:15)T(cid:16) j(cid:24) g (cid:9) ei (cid:2) (cid:1) ei(cid:1)expl(cid:2)g(cid:0)T(cid:3) A The next assumption has to do with the status of explanations (cid:0) Assumption (cid:1)(cid:2)(cid:3)(cid:3) The bodies of the rules in F for an atom are mutually exclusive(cid:9) Giventhe aboverules for athis meansthat Bi(cid:1)Bj is always falsefor each i (cid:5)(cid:24) j(cid:9) To ensure that this assumption holds we can add extra conditions to the rules(cid:9) See section (cid:17)(cid:9) Note that(cid:5) whereas assumptions (cid:1)(cid:9)(cid:6) and (cid:1)(cid:9)(cid:4) are syntactic assumptions about the theory that can be automatically checked(cid:5) assumptions (cid:1)(cid:9)(cid:12) and (cid:1) (cid:1)(cid:9)(cid:0)(cid:0) are statements about the world(cid:5) and not about the knowledge base (cid:9) We do not require FT j(cid:24) (cid:11)(cid:15)Bi (cid:1) Bj(cid:16)(cid:9) The language is not powerful enough to state such constraints(cid:9) For example(cid:5) it may be the case that the value (cid:1) Thereis(cid:0)however(cid:0)asyntacticconditionthatcanbeusedtocheckwhetherassumption (cid:6)(cid:1)(cid:4)(cid:4) has been violated(cid:1) This is when we can derive the bodies of two rules for an atom fromasetofassumptionsthatarenotinconsistent(cid:1) Forexample(cid:0)iffa(cid:0)nagandfb(cid:0)nbgare disjoint and covering sets of assumptions (cid:7)and so a and b are independent by assumption (cid:6)(cid:1)(cid:4)(cid:8)(cid:9)(cid:0)andwehaverules fc(cid:1)a(cid:0)c(cid:1)bg(cid:0)then weknowthe disjointbodies assumption(cid:6)(cid:1)(cid:4)(cid:4) has been violated(cid:0) as a and b cannot be both exclusive and independent(cid:1)
Description: