Table Of Content

Rewritability in Monadic Disjunctive Datalog, MMSNP, and Expressive Description Logics Cristina Feier, Antti Kuusisto and Carsten Lutz Fachbereich Informatik, Universityof Bremen, Germany 7 1 0 2 Abstract. We study rewritability of monadic disjunctive Datalog pro- n grams,(thecomplementsof)MMSNPsentences,andontology-mediated a queries (OMQs) based on expressive description logics of the ALC fam- J ily and on conjunctive queries. We show that rewritability into FO and 3 intomonadicDatalog(MDLog)aredecidable,andthatrewritabilityinto Datalogisdecidablewhentheoriginalquerysatisfiesacertaincondition O] related to equality.Weestablish 2NExpTime-completeness for all stud- ied problems except rewritability into MDLog for which there remains L a gap between 2NExpTime and 3ExpTime. We also analyze the shape . s of rewritings, which in the MMSNP case correspond to obstructions, c andgiveanewconstructionofcanonicalDatalogprogramsthatismore [ elementary than existing ones and also applies to formulas with free 1 variables. v 1 3 1 Introduction 2 2 0 In data accesswith ontologies,the premier aim is to answerqueries overincom- . plete and heterogeneous data while taking advantage of the domain knowledge 1 0 provided by an ontology [12,19]. Since traditional database systems are often 7 unaware of ontologies, it is common to rewrite the emerging ontology-mediated 1 queries (OMQs) into more standard database query languages.In fact, the DL- : v Litefamilyofdescriptionlogics(DLs)wasdesignedspecificallysothatanyOMQ i Q =(T,Σ,q) where T is a DL-Lite ontology, Σ a data signature, and q a con- X junctive query, can be rewritten into an equivalent first-order (FO) query that r can then be executed using a standard SQL database system [3,20]. In more a expressive ontology languages, it is not guaranteed that for every OMQ there is an equivalent FO query. For example, this is the case for DLs of the EL and Horn-ALC families [25,40], and for DLs of the expressive ALC family. In many members of the EL andHorn-ALC families, however,rewritabilityinto monadic Datalog (MDLog) is guaranteed, thus enabling the use of Datalog engines for query answering. In ALC and above, not even Datalog-rewritability is gener- ally ensured. Since ontologies emerging from practical applications tend to be structurally simple, though, there is reason to hope that (FO-, MDLog-, and Datalog-) rewritings do exist in many practically relevant cases even when the ontology is formulated in an expressive language. This has in fact been experi- mentally confirmedfor FO-rewritabilityin the EL family ofDLs [29], andit has led to the implementation of rewriting tools that, althoughincomplete, are able to compute rewritings in many practical cases [30,39,43]. Fundamentalproblemsthatemergefromthissituationaretounderstandthe exactlimitsofrewritabilityandtoprovide(complete)algorithmsthatdecidethe rewritabilityofagivenOMQandthatcomputearewritingwhenitexists.These problems have been adressed in [10,11,29] for DLs from the EL and Horn-ALC families. For DLs from the ALC family, first results were obtained in [13] where a connection between OMQs and constraint satisfaction problems (CSPs) was established that was then used to transfer decidability results from CSPs to OMQs. In fact, rewritability is an important topic in CSP (where it would be calleddefinability)asitconstitutesacentraltoolforanalyzingthecomplexityof CSPs[23,24,26,32].Inparticular,rewritabilityof(thecomplementof)CSPsinto FO andinto Datalogis NP-complete[6,21,32],andrewritabilityinto MDLog is NP-hardandinExpTime[21].In[13],theseresultswereusedtoshowthatFO- and Datalog-rewritability of OMQs (T,Σ,q) where T is formulated in ALC or a moderateextensionthereofandq is anatomic query(AQ)ofthe formA(x) is decidable and, in fact, NExpTime-complete. For MDLog-rewritability, one can show NExpTime-hardness and containment in 2ExpTime. The aim of this paper is to study the above questions for OMQs where the ontology is formulated in an expressive DL from the ALC family and where the actual query is a conjunctive query (CQ) or a union of conjunctive queries (UCQ). As observed in [13], transitioning in OMQs from AQs to UCQs corresponds to the transition from CSP to its logical generalization MMSNP intro- ducedbyFederandVardi[26]andstudied,forexample,in[14,34–36].Morepre- cisely, while the OMQ language (ALC,AQ) that consists of all OMQs (T,Σ,q) where T is formulated in ALC and q is an AQ has the same expressive power as the complement of CSP (with multiple templates and a single constant), the OMQ language (ALC,UCQ) has the same expressive power as the complement of MMSNP (with free variables)—which in turn is a notational variant of monadicdisjunctiveDatalog(MDDLog).Itshouldbenoted,however,thatwhile alltheseformalismsareequivalentinexpressivepower,theydiffersignificantlyin succinctness [13]; in particular, the best known translation of OMQs into MM- SNP/MDDLog involves a double exponential blowup. In contrast to the CSP case, FO-, MDLog-, and Datalog-rewritability of (the complement of) MMSNP sentenceswasnotknowntobedecidable.Inthispaper,weestablishdecidability of FO- and MDLog-rewritability in (ALC,UCQ) and related OMQ languages, in MDDLog, and the complement of MMSNP. We show that FO-rewritability is 2NExpTime-complete in all three cases, and that MDLog-rewritability is in 3ExpTime; a 2NExpTime lower bound was established in [17]. Let us discuss ourresultsonFO-rewritabilityfromthreedifferentperspectives.FromtheOMQ perspective,the transitionfromAQstoUCQsresultsinanincreaseofcomplex- ity from NExpTime to 2NExpTime. From the monadic Datalog perspective, adding disjunction (transitioning from monadic Datalog to MDDLog) results in a moderate increase of complexity from 2ExpTime [8] to 2NExpTime. And from the CSP perspective, the transition from CSPs to MMSNP results in a rather dramatic complexity jump from NP to 2NExpTime. For Datalog-rewritability, we obtain only partial results. In particular, we show that Datalog-rewritability is decidable and 2NExpTime-complete for MDDLog programs that, in a certain technical sense made precise in the paper, have equality. For the generalcase,we only obtain a potentially incomplete procedure. It is well possible that the procedure is in fact complete, but prov- ing this remains an open issue for now. These results also apply to analogously defined classes of MMSNP sentences and OMQs that have equality. While we mainly focus on deciding whether a rewriting exists rather than actually computing it, we also analyzethe shape that rewritingscantake.Since the shape turns out to be rather restricted, this is important information for algorithms (complete or incomplete) that seek to compute rewritings. In the CSP/MMSNP world, this corresponds to analyzing obstruction sets for MM- SNP, in the style of CSP obstructions [4,18,37] and not to be confused with colored forbidden patterns sometimes used to characterize MMSNP [36]. More precisely, we show that an OMQ (T,Σ,q) is FO-rewritable if and only if it is rewritable into a UCQ in which each CQ has treewidth (1,max{2,n }), n q q the size of q;1 equivalently, the complement of an MMSNP sentence ϕ is FO- definable if and only if it admits a finite set of finite obstructions of treewidth (1,k)wherekisthediameterofϕ(themaximumsizeofanegatedconjunctionin itsbody,inFederandVardi’sterminology).WealsoshowthatanOMQ(T,Σ,q) is MDLog-rewritable if and only if it is rewritable into an MDLog program of diameter (1,max{2,n }); equivalently, the complement of an MMSNP sentence q ϕ is MDLog-definable if andonly if it admits a (potentially infinite) set offinite obstructions of treewidth (1,k) where k is the diameter of ϕ. For the Datalog case, we give a new and direct construction of canonical Datalog-rewritings. It has been observed in [26] that for every CSP and all ℓ,k, it is possible to construct a canonical Datalog program Π of width ℓ and diameter k in the sense that if any such program is a rewriting of the CSP, then so is Π; moreover, even when there is no (ℓ,k)-Datalog rewriting, then Π is the best possible ap- proximation of such a rewriting. The existence of canonical Datalog-rewritings for (the complement of) MMSNP sentences was already known from [15]. How- ever, the construction given there is quite intricate, proceeding via an infinite template that is obtained by applying an intricate construction due to Cherlin, Shelah, andShi [22], whichmakes them rather hardto analyze.Incontrast,our constructionis elementaryand essentially parallelsthe CSP case;it alsoapplies to MMSNP formulas with free variables, where the canonical program takes a rather special form that involves parameters, similar in spirit to the parameters to least fixed-point operators in FO(LFP) [7]. Our main technical tool is the translation of an MMSNP sentence into a CSP exhibited by Feder and Vardi [26]; actually, the targetof the translationis 1 What we mean here is that q has a tree decomposition in which every bag has at most max{2,nq} elements and in which neighboring bag overlap in at most one element. a generalized CSP, meaning that there are multiple templates. The translation is not equivalence preserving and involves a double exponential blowup, but was designed so as to preserve complexity up to polynomial time reductions. Here, we are not so much interested in the complexity aspect, but rather in the semanticrelationshipbetweentheoriginalMMSNPsentenceandtheconstructed CSP. It turns out that the translation does not quite preserve rewritability. In particular, when the original MMSNP sentence has a rewriting, then the natural way of constructing from it a rewriting for the CSP is sound only on instances of high girth. However, FO- and MDLog-rewritings that are sound on high girth (and unconditionally complete) can be converted into rewritings that are unconditionally sound (and complete). The same is true for Datalog- rewritingswhentheMMSNPsentencehasequality,butitremainsopenwhether it is true in the general case. The structure of this paper is as follows. In Section 2, we introduce the es- sentials of disjunctive Datalog and its relevant fragments as well as CSP and MMSNP;infact,we shallalwaysworkwith BooleanMDDLog ratherthanwith the complement of MMSNP. In Section 3, we summarize the main properties of Feder and Vardi’s translation of MMSNP into CSP. We use this in Section 4 to show that FO- and MDLog-rewritability of Boolean MDDLog programs and of the complement of MMSNP sentences is decidable, also establishing the an- nounced complexity results. In Section 5, we analyze the shape of FO- and MDLog-rewritingsand ofobstructions for MMSNP sentences.We alsoestablish an MMSNP analogue of an essential combinatorial lemma for CSPs that al- lows to replace a structure by a structure of high girth while preserving certain homomorphisms (in the CSP case) respectively the truth of certain MMSNP sentences (in the MMSNP case). In Section 6, we study Datalog-rewritability of MDDLog programs that have equality and construct canonical Datalog programs. Section 7 is concerned with lifting our results from the Boolean case to thegeneralcase,concerningthecomplexityofdecidingrewritability,theshapeof rewritings, and the construction of canonical Datalog programs.In this section, Datalogprogramswithparametersplayacentralrole.InSection8,weintroduce OMQs and further lift our results to this setting. We conclude in Section 9. 2 Preliminaries A schema is a finite collection S = (S ,...,S ) of relation symbols with asso- 1 k ciated arity. An S-fact is an expression of the form S(a ,...,a ) where S ∈ S 1 n is an n-ary relation symbol, and a ,...,a are elements of some fixed, count- 1 n ably infinite set const of constants. For an n-ary relation symbol S, pos(S) is {1,...,n}. An S-instance I is a finite set of S-facts. The active domain dom(I) of I is the set of all constants that occur in a fact in I. For an instance I and a schema S, we write I| to denote the restriction of I to the relations in S. S Atreedecomposition ofaninstanceI isapair(T,(B ) ),whereT =(V,E) v v∈V is anundirectedtree and(B ) is a family ofsubsets ofdom(I) suchthat the v v∈V following conditions are satisfied: 1. for all a∈dom(I), {v∈V |a∈B } is nonempty and connected in T; v 2. for every fact R(a ,...a ) in I, there is a v ∈V such that a ,...,a ∈B . 1 r 1 r v Unlike in the traditional setup [27], we are interested in two parameters of tree decompositions instead of only one. We call (T,(B ) ) an (ℓ,k)-tree v v∈V decomposition ifforall v,v′ ∈V,|Bv∩Bv′|≤ℓand|Bv|≤k.AninstanceI has treewidth (ℓ,k) if it admits an (ℓ,k)-tree decomposition. Wenowdefinethenotionofgirth.AfinitestructureI hasacycle oflengthn ifitcontainsdistinctfactsR (a ),...,R (a ), a =a ···a ,andthere 0 0 n−1 n−1 i i,1 i,mi are positions p ,p′ ∈pos(R ), 0≤i<n such that: i i i – p 6=p′ for 1≤i≤n; i i – ai,p′i =ai⊕1,pi⊕1 for 0≤i<n, where ⊕ denotes addition modulo n. The girth of I is the length of the shortest cycle in it and ∞ if I has no cycle (in which case we say that I is a tree). A constraint satisfaction problem (CSP) is defined by an instance T over a schema S , called template. The problem associated with T, denoted CSP(T), E is to decide whether aninput instance I overS admits a homomorphismto T, E denoted I → T. We use coCSP(T) to denote the complement problem, that is, deciding whether I 6→ T. A generalized CSP is defined by a set of templates S over the same schema S and asks for a homomorphism from the input I to at E least one templates T ∈S, denoted I →S. AnMMSNPsentenceθoverschemaS hastheform∃X ···∃X ∀x ···∀x ϕ E 1 n 1 m withX ,...,X monadicsecond-ordervariables,x ,...,x first-ordervariables, 1 n 1 m and ϕ a conjunction of quantifier-free formulas of the form α ∧···∧α →β ∨···∨β with n,m≥0, 1 n 1 m where each α takes the form X (x ) or R(x) with R ∈ S , and each β takes i i j E i the form X (x ). The diameter of θ is the maximum number of variables in i j some implication in ϕ. This presentation is syntactically different from, but semantically equivalent to the original definition from [26], which does not use theimplicationsymbolandinsteadrestrictstheallowedpolaritiesofatoms.Both forms can be interconverted in polynomial time, see [13]. More information on MMSNP can be found, e.g., in [14,15]. Aconjunctivequery(CQ) takestheform∃yϕ(x,y)whereϕisaconjunction ofrelationalatomsandx,ydenotetuplesofvariables;theequalityrelationmay be used. Whenever convenient, we will confuse a CQ ∃yϕ(x,y) with the set of atoms in ϕ. A union of conjunctive queries (UCQ) is a disjunction of CQs with the same free variables. A disjunctive Datalog rule ρ has the form S (x )∨···∨S (x )←R (y )∧···∧R (y ) 1 1 m m 1 1 n n wheren>0andm≥0.We refertoS (x )∨···∨S (x )asthe head ofρ,and 1 1 m m to R (y )∧···∧R (y ) as the body. Every variable that occurs in the head of 1 1 n n a rule ρ is required to also occur in the body of ρ. A disjunctive Datalog (DDLog) program Π is a finite set of disjunctive Dat- alog rules with a selected goal relation goal that does not occur in rule bodies andappearsonly in non-disjunctive goal rules goal(x)←R (x )∧···∧R (x ). 1 1 n n The arity of Π is the arity of the goal relation; we say that Π is Boolean if it has arity zero. Relation symbols that occur in the head of at least one rule of Π are intensional (IDB) relations, and all remaining relation symbols in Π are extensional (EDB) relations. Note that, by definition, goal is an IDB relation. We will sometimes use body atoms of the form true(x) that are vacuously true for all elements of the active domain. This is just syntactic sugar since any rule with body atom true(x) can equivalently be replaced by a set of rules obtained by replacing true(x) in all possible ways with an atom R(x ,...,x ) 1 n where R is an EDB relation and where x = x for some i and all other x are i i fresh variables. A DDLog program is called monadic or an MDDLog program if all its IDB relationswiththepossivleexceptionofgoalhavearityatmostone.Thesize ofa DDLog programΠ is the number of symbols needed to write it (where relation symbols and variables names count one), its width is the maximum arity of non-goal IDB relations used in it, and its diameter is the maximum number of variables that occur in a rule in Π. A Datalog rule is a disjunctive Datalog rule in which the rule head contains exactly one disjunct. Datalog (DLog) programsand monadic Datalog (MDLog) programs are then defined in the expected way. We call a Datalog program an (ℓ,k)-Datalog program if its width is ℓ and its diameter is k. For Π an n-ary MDDLog program over schema S , an S -instance I, and E E a ,...,a ∈ dom(I), we write I |= Π(a ,...,a ) if Π ∪I |= goal(a ,...a ) 1 n 1 n 1 n where the variables in all rules of Π are universally quantified and thus Π is a set of first-order (FO) sentences. A query q over S of arity n is E – sound for Π if for all S -instances I and a ∈ dom(I), I |= q(a) implies E I |=Π(a); – complete for Π if for all S -instances I and a ∈dom(I), I |= Π(a) implies E I |=q(a); – a rewriting of Π if it is sound for Π and complete for Π. To additionally specify the syntactic shape of q, we speak of a UCQ-rewriting, anMDLog-rewriting,andsoon.AnFO-rewriting takestheformofanFO-query that uses only relations from the relevant EDB schema and possibly equality, but neither constants nor function symbols. We say that Π is FO-rewritable if there is an FO-rewriting of Π, and likewise for UCQ-rewritability and MDLog- rewritability. It was shown in [13] that the complement of an MMSNP sentence can be translated into an equivalent Boolean MDDLog program in polynomial time and vice versa; moreover, the transformations preserve diameter and all other parameters relevant for this paper. From now on, we will thus not explicitly distinguish between Boolean MDDLog and (the complement of) MMSNP. S′E-instance I′ SE-instance I SE-instance I′′ r r a c′ c a c′ c a c′ c r R R r r r R R q q q q R r q a′ b′ a′ b′ a′ b′ R r r R q q b b b Fig.1. Translating an S′E-instance into an SE-instance and vice versa 3 MDDLog, Simple MDDLog and CSPs Feder and Vardi show how to translate an MMSNP sentence into a generalized CSP that has the same complexity up to polynomial time reductions [26]. The resulting CSP has a different schema than the originalMMSNP sentence and is thus notequivalent to it. We aregoing to make use ofthis translationto reduce rewritability problems for MDDLog to the corresponding problems for CSPs. Consequently, our main interest is in the precise semantic relationship between theMMSNPsentenceandtheconstructedCSP,ratherthanintheircomplexity. In this section,we sum up the properties of the results obtained in [26] that are relevant for us. These properties are all we need in later sections, that is, we do not need to go further into the details of the translation. For the reader’s convenience and information, we still give full details in Appendix A; these are basedonthe presentationgivenin [17],which is more detailed than the original presentation in [26]. Let S be a schema. A schema S′ is a k-aggregation schema for S if its E E E relations have the form R where q(x) is a CQ over S and the arity of q(x) E R is identical to the number of free variables of q(x), which is at most k. q(x) ThegeneralizedCSPtobeconstructedlatermakesuseofaschemaofthisform. Whatisimportantatthispointisthattherearenaturaltranslationsofinstances between the two schemas. To make this precise, let I be an S -instance. The E corresponding S′ -instance I′ consists of all facts R (a) such that I |= q(a). E q(x) Conversely, let I′ be an S′ -instance. The corresponding S -instance I consists E E of all facts S(b) such that R (a)∈I′ and S(b) is a conjunct of q(a). q(x) Example 1. Assume that S consists of the binary relation r. Let E q(x)=r(x ,x )∧r(x ,x )∧r(x ,x ) 1 2 2 3 3 1 where x = (x ,x ,x ) and let S′ consist of R . Take the S′ -instance I′ 1 2 3 E q(x) E defined by R (a,a′,c′),R (b,b′,a′),R (c,c′,b′). q q q The corresponding S -instance I is E r(a,a′),r(a′,c′),r(c′,a),r(b,b′),r(b′,a′),r(a′,b),r(c,c′),r(c′,b),r(b′,c). Notethat,whenwetaketheS′ -instanceI′′correspondingtoI,wedonot obtain E I′,butratherastrictsupersetthatcontainsadditionalfactssuchasR (c′,b′,a′). q The instances I, I′, and (a subset of) I′′ are depicted in Figure 1. Thetranslationin[26]consistsoftwosteps.WedescribethemhereusingBoolean MDDLoginsteadof(thecomplementof)MMSNP.Thefirststepistotransform the given Boolean MDDLog program Π into a Boolean MDDLog program Π S over a suitable aggregationschema S′ such that Π is of a restricted syntactic E S form. In the second step, one transforms Π into a generalized CSP whose S complement is equivalent to Π . S Westartwithsumminguptheimportantaspectsofthefirststep.ABoolean MDDLog programΠ is simple if it satisfies the following conditions: S 1. everyrule inΠ containsatmostoneEDB atomandthis atomcontainsall S variables of the rule body, each variable exactly once; 2. rules without an EDB atom contain at most a single variable. Now, the first step achieves the following. Theorem 1 ([26]). Given a Boolean MDDLog program Π over EDB schema S of diameter k, one can construct a simple Boolean MDDLog program Π E S over a k-aggregation schema S′ for S such that E E 1. If I is an S -instance and I′ the corresponding S′ -instance, then I |=Π iff E E I′ |=Π ; S 2. If I′ is an S′ -instance and I the corresponding S -instance, then E E (a) I′ |=Π implies I |=Π; S (b) I |=Π implies I′ |=Π if the girth of I′ exceeds k. S If Π is of size n, then the size of Πs and the cardinality of S′ are bounded by E 2p(k·logn), p a polynomial, and the arity of relations in S′ is bounded by k. The E construction takes time polynomial in the size of Π . S ThetranslationunderlyingTheorem1consistsofthreestepsitself:firstsaturate Π by adding all rules that can be obtained from a rule in Π by identifying variables;thenrewriteΠ inanequivalence-preservingwaysothatallrulebodies arebiconnected,introducingfreshunaryandnullaryIDBsasneeded.Andfinally replace the conjunction q(x) of all EDB atoms in each rule body with a single EDB atom R (x), additionally taking care of interactions between the new q(x) EDB relations that arise e.g. when we have two relations R and R such q(x) p(x) that q(x) is contained in p(x) (in the sense of query containment). The following theorem summarizes the second step of the translation of Boolean MDDLog into generalized coCSP. Theorem 2 ([26]). Let Π be a simple Boolean MDDLog program over EDB schema S and with IDB schema S , m the maximum arity of relations in S . E I E Then there exists a set of templates S over S such that Π E 1. Π is equivalent to coCSP(S ); Π 2. |SΠ|≤2|SI| and |T|≤|SE|·2m|SI| for each T ∈SΠ; The construction takes time polynomial in |T|. PT∈SΠ We again sketch the idea underlying the proof of the theorem. The desired set of templates S contains one template for every 0-type, that is, for every set of Π nullaryIDBrelationsinΠ thatdoesnotcontaingoal()andthatsatisfiesallrules inΠ whichuse only nullaryIDBs.Eachtemplates containsone constantc for M every 1-type M, that is, for every set M of unary IDBs that agrees on nullary IDBs with the 0-type for which the template was constructed and that satisfy all rules in Π which use only IDBs that are at most unary. One then interprets all EDB relations in a maximal way so that all rules in Π are satisfied. The fact that Π is simple implies that there are no choices,that is,there is only one maximalinterpretationofeachEDBrelationandtheinterpretationsofdifferent such relations do not interact. Details are given in the appendix. 4 FO- and MDLog-Rewritability of Boolean MDDLog Programs We exploit the translations described in the previous section and the known results that FO-rewritability of CSPs and MDLog-rewritability of coCSPs are decidable to obtain analogous results for Boolean MDDLog, and thus also for MMSNP. In the case of FO, we obtain tight 2NExpTime complexity bounds. For MDLog, the exact complexity remains open (as in the CSP case), between 2NExpTime and 3ExpTime. We start with observing that FO-rewritability and MDLog-rewritability are morecloselyrelatedthanone mightthink atfirstglance.In fact,everyMDLog- rewriting can be viewed as an infinitary UCQ-rewriting and, by Rossman’s ho- momorphismpreservationtheorem[41],FO-rewritabilityofaBooleanMDDLog program coincides with (finitary) UCQ-rewritability. The latter is true also in the non-Boolean case. Proposition 1. Let Π be an MDDLog program. Then Π is FO-rewritable iff Π is UCQ-rewritable. Proof. It is wellknownandeasy to show that truth ofdisjunctive Datalogpro- grams is preserved under homomorphisms. Thus, the proposition immediately follows fromRossman’s theoremin the Booleancase.For the non-Booleancase, weobservethatRossmanestablisheshisresultalsointhe presenceofconstants. Let Π be an MDDLog programand ϕ(x) a rewriting of Π. We can apply Ross- man’s result to ϕ(a), where a is a vector of constants of the same length as x, obtaining a UCQ q(a) equivalent to ϕ(a). Let q(x) be obtained from q(a) be replacing the constants in a with the variables from x. It can be verified that q(x) is a rewriting of Π. ❏ For utilizing the translationofBooleanMDDLog programsto generalizedCSPs in the intended way, the interesting aspect is to deal with the translation of a Boolean MDDLog program Π into a simple program Π stated in Theorem 1, S sinceitisnotequivalencepreserving.Thefollowingpropositionrelatesrewritings of Π to rewritings of Π . S Lemma 1. Let Π be a Boolean MDDLog program of diameter k, Π as in S Theorem 1, and Q∈{UCQ,MDLog,DLog}. Then 1. everyQ-rewritingofΠ caneffectively beconvertedintoaQ-rewritingofΠ; S 2. every Q-rewriting of Π can effectively be converted into a Q-rewriting of Π S that is (i) sound on instances of girth exceeding k and (ii) complete. Proof. Let S and S′ be the EDB schema of Π and of Π , respectively. We E E S start with the case Q=UCQ. ForPoint1,letq beaUCQ-rewritingofΠ .Letq betheUCQobtained ΠS S Π from q by replacing every atom R (y) with q[y/x], that is, with the result ΠS q(x) of replacing the variables x in q(x) with the variables y (which may lead to identifications). We show that q is as required. Let I be an S -instance and Π E I′ the corresponding S′ -instance. Then we have I |=Π iff I′ |=Π (by Point1 E S of Theorem 1) iff I′ |= q (by choice of q ) iff I |= q (by construction of I′ ΠS ΠS Π and of q ). Let us expand on the latter. Π First assume that I′ |= q . Then there is a CQ q in q and a homomor- ΠS ΠS phism h from q to I′. By construction, q contains a CQ q′ that is obtained Π from q by replacing every atom R (y) ∈ q with q[y/x]. Clearly, for every q(x) atomR (y)∈q,wemusthaveR (h(y))∈I′.The constructionofI′ yields q(x) q(x) q(h(y))⊆I. Consequently, h is also a homomorphism from q′ to I. Conversely, assume that there is a CQ q′ in q and a homomorphism h from q′ to I. Then Π thereisaCQq inq fromwhichq′ wasobtainedbythe describedreplacement ΠS operation. For every atom R (y) ∈ q, we must have q(h(y)) ⊆ I. We obtain q(x) R (h(y))∈q and thus h is a homomorphism from q to I′. q(x) For Point 2, let q be a UCQ-rewriting of Π. The UCQ q consists of all Π ΠS CQs that can be obtained as follows: 1. chooseaCQq(x)fromq ,identify variablesinq toobtainaCQq′(x′),and Π choose a partition q (x ),...,q (x ) of q′(x′); 1 1 n n 2. foreachi∈{1,...,n},choosearelationR fromS′ anda vectory of|z| p(z) E variables (repeated occurrences allowed) that are either from x or do not i occur in x′ such that q (x ) ⊆ p[y/z]; then replace q (x ) in q′(x′) with the i i i i single atom R (y). p(z) To establish that q is as desired, we show that for every S′ -instance I′ ΠS E (I) I′ |=q implies I′ |=Π if I′ is of girth exceeding k (soundness) and ΠS S (II) I′ |=Π implies I′ |=q (completeness). S ΠS Let I be the S -instance corresponding to I′. E For Point (I), we observe that I′ |= q implies I |= q (by construction of ΠS Π q and of I′) implies I |=Π (by choice of q ) implies I′ |=Π (by Point2bof ΠS Π S

Rewritability in Monadic Disjunctive Datalog, MMSNP, and Expressive Description Logics PDF

0.47 MB·

by Cristina Feier

#journals #arxiv

Checking for file health...

Save to my drive

Quick download

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Rewritability in Monadic Disjunctive Datalog, MMSNP, and Expressive Description Logics

See more

The list of books you might like

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.