ebook img

Bottom-Up Evaluation of Datalog: Preliminary Report PDF

0.12 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Bottom-Up Evaluation of Datalog: Preliminary Report

Bottom-Up Evaluation of Datalog: Preliminary Report Stefan Brass HeikeStephan Institutfu¨rInformatik,Martin-Luther-Universita¨tHalle-Wittenberg,Germany [email protected] [email protected] Bottom-upevaluationofDataloghasbeenstudiedforalongtime, andisstandardmaterialintext- books.However,ifoneactuallywantstodevelopadeductivedatabasesystem,itturnsoutthatthere aremanyimplementationoptions. Forinstance,thesequenceinwhichruleinstancesareappliedis notgiven. Inthispaper,westudyamethodthatimmediatelyusesaderivedtupletoderivemoretu- ples(calledthePushmethod).Inthisway,storagespaceforintermediateresultscanbereduced.The maincontributionofourmethodisthewayinwhichweminimizethecopyingofvaluesatruntime, anddomuchworkalreadyatcompile-time. 1 Introduction The efficient evaluation of queries expressed as logic programs remains an everlasting problem. Of course,bigachievementshavebeenmade,butatthesametimeproblemsizeandcomplexitygrows. Any furtherprogress canincreasethepractical applicability oflogic-based, declarative programming. Our long-term goal is to develop a new deductive database system. This has many aspects, for instance, language design. However, in the current paper, we exclude all special language features, including negation, andfocusonefficientqueryevaluation forbasicDatalog. The magic set method is the standard textbook method for making bottom-up evaluation goal- directed. Many optimizations have been proposed, including our own SLDMagic method [1] and a method based on Earley deduction [3]. We assume in the current paper that such a rewriting of the program hasbeendone,sowecanconcentrate onpurebottom-up evaluation. Asweunderstandit,bottom-upevaluationisanimplementationoftheT -operatorthatcomputesthe P minimal model of the program. However, an implementation isfree in the order in which it applies the rule instances, while the T -operator first derives all facts that are derivable with a given set of known P facts,beforethederivedfactsareused(inthenextiteration). Furthermore, factsdonothavetobestored until theend ofquery evaluation, but can bedeleted as soon asall possible derivations using them have beendone,exceptforthefactsthatformtheanswertothequery. Therefore,thesequenceofruleinstance applicationbecomesimportant. Ifonecomputespredicatebypredicateasthestandardtextbookmethod, one of course needs to store the entire extension of the predicates. However, if one uses derived tuples immediately,itmightbepossibletostoreonlyonetupleofthepredicateduringtheevaluation. Ofcourse, forduplicateeliminationandtermination, itmightstillbenecessarytostoreextensionsofafewselected predicates. It is also not given that tuples (facts) must be represented explicitly as records or objects in the program. It suffices if one knows where the values of single columns (predicate arguments) can be found. Inthisway,alotofcopyingcanbesavedbecausetuplesfortheruleheadaretypicallyconstructed from values bound in the rule body. Of course, one must ensure that the values are not changed before allusagesarefinished. S.SchwarzandJ.Voigtla¨nder(Eds.):29thand30thWorkshops on(Constraint)LogicProgrammingand24thInternational (cid:13)c S.Brass&H.Stephan WorkshoponFunctionaland(Constraint)LogicProgramming ThisworkislicensedundertheCreativeCommons (WLP’15/’16/WFLP’16). Attribution-Noncommercial-NoDerivativeWorksLicense. EPTCS234,2017,pp.13–26,doi:10.4204/EPTCS.234.2 14 Bottom-UpEvaluationofDatalog:PreliminaryReport OurplanistotranslateDatalogtoC++,andtogenerateexecutable codefromtheresulting program. This permits to use existing compilers for low-level optimizations and gives an interface for defining built-in predicates. In [2], we already discussed implementation alternatives for bottom-up evaluation anddidperformancecomparisonsforafewexampleprograms. Nowwewillimprovethe“pushmethod” fromthatpaperbychangingthesetofvariablesusedtorepresentintermediatefacts. Thisisessentialfor reducing theamountofcopying. Italsoenablesustodomoreprecomputation at“compiletime”. The idea of immediately using derived facts to derive more facts is not new. For instance, variants of semi-naive evaluation have been studied which workin this way[10, 12]. Italso seems to berelated to the propagation of updates to materialized views. However, the representation of tuples at runtime and the code structure is different from [10] (and this is essential for the reduction of copying values). Thepaper[12]translatesfromatemporalDatalogextensiontoProlog,whichmakesanyefficiencycom- parison dependend onimplementation details oftheused Prologcompiler. Wealso believe thattherule applicationgraphintroducedinourpaperisausefulconcept. Furtherliteratureabouttheimplementation of deductive database systems is, for instance, [8, 4, 9, 11, 7, 13]. A current commercial deductive DB system isLogicBlox[5]. Abenchmarkcollection isOpenRuleBench [6]. 2 Basic Definitions Inthispaper,weconsiderbasicDatalog,i.e.purePrologwithoutnegationandwithoutfunctionsymbols (i.e. terms can only be variables or constants). We also assume without loss of generality that all rules have at most two body literals. The output of our rewriting methods [1, 3] has this property. (But in anycase, itisnorestriction since onecanintroduce intermediate predicates.) Finally, werequire range- restriction (allowedness), i.e. all variables in the head of the rule mustalso appear in abody literal. For technical purposes, weassumethateachrulehasauniquerulenumber. Asusualindeductivedatabases, weassumethatEDBandIDBpredicates aredistinguished (“exten- sional” and“intensional database”). EDBpredicates aredefined byfacts only, e.g.stored inarelational database orspecially formatted files. Also program input isrepresented inthis way. IDB predicates are defined by rules. There is a special IDB-predicate answer that only appears in the head of one or more rules. Thetaskistocomputetheextensionofthispredicateintheminimalmodeloftheprogram,i.e.all derivable answer-facts. Weassumethatthelogicprogram fortheIDBpredicates aswellasthequery(i.e.theanswer-rules) are given at “compile time”, whereas the database for the EDB predicates is only known at “runtime”. Sincethesameprogramcanbeexecutedseveraltimeswithdifferentdatabasestates,anyoptimizationor precomputation wecandoatcompile timewillpay offinmostcases. Itmightevenbeadvantageous in asingleexecution becausethedatabase islarge. Since we want to generate C++ code, we assume that a data type known for every argument of an EDB predicate. The method does not need type information for IDB predicates (this is implicitly computed). Data structures for storing relations for EDB predicates can be characterized with binding patterns: A binding pattern for a predicate p with n arguments is a string of length n over the alpha- bet{b,f}. Theletterb(“bound”) meansthatavalueforthecorresponding argument isknownwhenthe predicate relation isaccessed (input), f (“free”) meansthatavalueneedstobelookedup(output). Asmentioned above, ourrewriting methods[1,3]produce rulesthathaveatmosttwobody literals. FurthermorethecaseoftwoIDB-literalsisrare—itisonlyusedinspecialcasesfortranslatingcomplex recursions. MostruleshaveonebodyliteralwithIDB-predicateandonewithEDB-predicate. Ofcourse, therearealsoruleswithonlyonebodyliteral(EDBorIDB). S.Brass&H.Stephan 15 3 Accessing Database Relations TheapproachwewanttofollowistotranslateDatalogintoC++,whichcanthenbecompiledtomachine code. Ofcourse, weneedaninterface toaccessrelations fortheEDBpredicates. Theserelations canbe storedinastandardrelationaldatabase,butitisalsopossibletoprogramthispartoneself(atthemoment, wedonotconsiderconcurrent updatesandmulti-user access). Weassumethatitispossibletoopenacursor(scan,iterator)overtherelation,whichpermitstoloop overalltuples. WeassumethatforeveryEDBpredicate pthereisaclass p_cursorwiththefollowing methods: • void open(): Openascanovertherelation, i.e.placethecursorbeforethefirsttuple. • bool fetch(): Movethecursortothenexttuple. Thisfunctionmustalsobecalledtoaccessthe first tuple. It returns true if there is a first/next tuple, or false if the cursor is at the end of the relation. • T col_i(): Getthevalueofthei-thcolumn (attribute) inthecurrent tuple. HereT isthetypeof thei-thcolumn. • close(): Closethecursor. Forrecursiverules,wewillalsoneed • push(): Savethestateofthecursoronaglobalstack. • pop(): Restorethestateofthecursor. A relation may have special access structures (e.g. it might be stored in a B-tree, hash table or array). Then not only a full scan (corresponding to binding pattern ff...f) is possible, but also scans only over tuples withgivenvalues forcertain arguments. Weassumethatinsuchcases thereareadditional cursor classescalled p_cursor_b ,withabindingpatternb . Theseclasseshavethesamemethodsastheother cursorclasses,onlytheopen-methodhasparametersfortheboundarguments. E.g.ifpisapredicate of arity 3 that permits particularly fast access to tuples with a given value of the first argument, and if this argument hastypeint,theclassp_cursor_bffwouldhavethemethodopen(int x). 4 Duplicate Elimination and Termination Themaincontribution ofthispaperisthewayinwhichcopyingandmaterialization oftuplesisavoided. Ourmethodbasicallypushesnewlyderivedfactstobodyliteralswheretheycanbeusedtoderivefurther facts. However, in the presence of recursion, we must be able to notice whether a derived tuple is new or not. Therefore, in each recursive cycle, at least one predicate must be materialized (“tabled”) to ensure termination. Asimplesolution istocreatehashtablesforthepredicates inquestion. This solution means that we materialize the extensions of some IDB predicates (hopefully, only a few) and copy all data values for the tuples of these predicates. In some cases, information about order oracyclicitymighthelptoavoidthis. Informationaboutkeysanddatadistributioncouldbeusedtomake sensible optimization decisions. Furthermore, iftuples areproduced inasort order, theduplicate check canbedone veryefficiently andwithoutstoring thepredicate extension. Allthisissubject ofourfuture work. It is also interesting that the data values in a derived tuple are stored at different times in program variables. For instance, we might know that when p(X,Y) is generated, X only seldom changes, and Y 16 Bottom-UpEvaluationofDatalog:PreliminaryReport hDeclaration Sectioni; hInitialization Sectioni; // Initializes backtrack_stack while(!backtrack_stack.is_empty()) { switch(backtrack_stack.pop()) { case L1: l1: hCode Piece 1i; // break or goto at end of Code Piece case L2: l2: hCode Piece 2i; ... } } Figure1: Overallstructure ofthegenerated code changes muchmoreoften. Thenanested relation mightbebestfortablingthepredicate forthepurpose ofduplicate detection. Ofcourse, breaking eachrecursive cyclewithaduplicate detection isonly theminimumwehaveto dotoensuretermination. Alsonon-recursiverulescangenerateduplicates,andinsomecasesitmightbe more efficient to detect these duplicates early in order to avoid duplicate computations (since the price for duplicate detection isquite high, inother cases it might bemore efficient to simply do theduplicate work). 5 Code Generation: Overall Structure Theresult ofthetranslation looks basically asshown inFigure 1. Sothere aremanysmall code pieces, each with a label that is suitable for a goto. Furthermore, when there are several things to do, e.g. a generatedfactcanbeusedinmorethanonerule,abacktrackpointissetupforthesecondrule,andthen agotoisdoneforthefirst. Whenanexecution pathreachesanend,theswitchisleftwithbreak,and one of the delayed tasks is taken from the stack. Therefore, each code piece also has a unique number, whichcanbestoredonthebacktrack stack,andusedintheswitchtoreachthecodepiece. Optimizations are possible, e.g. one can order the code pieces such that some jumps can be elimi- nated, because the target is immediately following. Some backtrack points can be avoided by finding a suitable codesequence. 5.1 DeclarationSection Datanotknownatcompiletimealways originates from thedatabase. Inorder tominimize copying, we (usually) introduce aC++variableonlyforDatalogvariables which • occurinanEDBbodyliteral, • but do not occur in an IDB body literal of that rule (because then the value comes from another rule,whereavariable hasbeencreated, ifthevalueisnotknownatcompiletime), S.Brass&H.Stephan 17 • andoccurintheheadofthatrule(becauseotherwisethevaluedoesnotreallyhavetobeprocessed intheprogram). Forinstance, consider thefollowingrule: p(X,Y,a)←q(Y)∧r(X,Y,Z,Z). If q is an IDB predicate and r an EDB predicate, we create a C++ variable only for X. A variable or constant forYexistsalreadywhentheruleisactivated. Inseldomcasesofrecursiveruleapplications(seeSection5.4below)wecreateC++variablesforall variables oftherule. Iftheabovecondition showsthewemustcreateaC++variableforvariableXinruler ,wegenerate thefollowingcodelineinthedeclaration section: T vr _X; We use the prefix with the rule number so that there can be no name conflicts between variables of different rules. TistheC++datatypeforthedatabasecolumninwhichXoccurs. 5.2 SymbolicFacts A symbolic fact consists of an IDB predicate p and a tuple (t ,...,t ) of C++ variables (i.e. their iden- 1 n tifiers) and constants, where n isthe arity ofp. Soasymbolic fact represents what isknown at compile timeaboutafactthatwillbederivedatruntime. Forsomearguments, wemightknowtheexactvalue(a constant), forotherarguments, weknowtheC++variablewhichwillcontainthevalue. An initial set of symbolic facts is derived by rules without IDB body literals. Then our task is to pass eachderived symbolic fact tomatching IDBbody literals andtoderive asymbolic factfortherule head. For each such rule application, a code piece is generated which does the remaining computation at runtime. The computation of symbolic facts is similar to the standard fixpoint iteration to compute the minimal model (but it is done at “compile time”, when the data for the EDB predicates are not yet known). “Matching” between a symbolic fact and a body literal means that they are unifiable. In general a fullunificationmustbedone(atcompiletime). Considere.g.thebodyliteralp(X,X,a)andthesymbolic fact p(b,v1 Y,v1 Y). The rule cannot be applied to the symbolic fact, so no code is generated for this case. 5.3 Rule ApplicationGraph As explained above, we assume that all rules have at most two body literals. A “Symbolic Rule Appli- cation” isrepresented by • a rule from the logic program with one IDB body literal, together with a symbolic fact matching thisbodyliteral, or • arulewithoutIDBbodyliterals, or • a rule with two IDB body literals, with one of the two selected, together with a symbolic fact matching thisbodyliteral. (Intherarecase oftwoIDBbodyliterals, weusetemporary tables for factsmatching eachbody literal. Thesymbolic factinthisruleapplication describes thesituation that we just computed a new fact for one of the IDB body literals. For the other body literal we usethetablewithpreviously computed facts.) 18 Bottom-UpEvaluationofDatalog:PreliminaryReport The result of a symbolic rule application is a symbolic fact. Let p(t ,...,t ) be the head of the rule, 1 n and r be its rule number. If the rule has an IDB body literal, let q be a most general unifier with the input symbolic fact. Werequire that variable-to-variable bindings aredone such that logic variables are replaced byC++variables. Thenthederivedsymbolicfactisp(u ,...,u ),whereu is 1 n i • t ift isaconstant. i i • tq ift isavariable whichappearsintheIDBbodyliteral(ifthereisone). i i • vr Xift isavariableXwhichdoesnotappearintheIDBbodyliteral. i Nowwecandoastandard fixpoint computation tocomputeallsymbolic factswhicharederivable from theprogram. Thisprocesswillcometoanend,becausethenumberofsymbolicfactsisbounded: There is only a finite number of C++ variables (at most the number of variables in the given logic program, where variables with the same name in different rules count as distinct). Furthermore, only a finite number ofconstants occurs inthegivenlogicprogram (constants whichappear only inthedatabase are notknownat“compiletime”andnotusedforcomputing symbolicfacts). The structure of the computation can be shown in a “rule application graph”. It has two types of nodes, namelysymbolic facts(“fact nodes”), andsymbolic ruleapplications (“rule nodes”). Thereisan edgefromeverysymbolicfacttoeverysymbolicruleapplication whichusesthesymbolicfact. Further- more,thereisanedgefromeverysymbolicruleapplication tothesymbolicfactitgenerates. Of course, it is possible to show only the rule in nodes for symbolic rule applications (since the symbolic factisidentified bytheincomingedge, exceptinthecaseoftwoIDBbodyliterals). However, then there can be several nodes marked with the same rule: It is possible that a single rule is compiled severaltimesfordifferent symbolicfactsmatchingitsIDBbodyliteral. Notealsothatnoteveryapplication ofarecursive ruletoasymbolic factisactually recursive: Only if the same symbolic fact can be generated by applying this rule (maybe indirectly via other rules), we have to be prepared for recursive invocations of the code piece for the symbolic rule application. This canbeseenfromcyclesinthegraph. Finally,nodesinthegraphfromwhichthereisnopathtoananswer-nodecanbeeliminated: Theydo notcontribute tothecomputation oftheanswer. Iftheprogram istheresultofaprogramtransformation like magic sets, this path will not be followed at runtime, but it is better not to generate code for it. An exampleofsuchaprogramis answer(X) ← q(X,a). q(X,Y) ← p(Y,X). p(a,X) ← r(X). p(b,X) ← s(X). Theruleapplication graphisshowninFigure2. Therightpathisuseless. Inthecodegeneration below, weassumethatsuchuselesscomputationpathshavebeenremoved,i.e.fromeverynodeafactnodewith predicate answer is reachable. This in particular means that every fact node with a predicate different from“answer”hasanoutgoing edge. 5.4 VariableConflicts In rare cases of recursive rule applications, it is possible that a rule is applied to a symbolic fact which contains alreadyavariable generated forthatrule. Anexampleis p(X,Y) ← r(X,Y). p(Y,Z) ← p(X,Y)∧r(Y,Z). S.Brass&H.Stephan 19 ✎ ☞ answer(v3 X) ✍ ✻ ✌ answer(X)←q(X,a). ✻ ✎ ☞ ✎ ☞ q(v3 X,a) q(v4 X,b) ✍ ✻ ✌ ✍ ✻ ✌ q(X,Y)←p(Y,X). q(X,Y)←p(Y,X). ✻ ✻ ✎ ☞ ✎ ☞ p(a,v3 X) p(b,v4 X) ✍ ✻ ✌ ✍ ✻ ✌ p(a,X)←r(X). p(b,X)←s(X). Figure2: RuleApplication GraphwithUselessPart(tobeeliminated). The first rule generates the symbolic fact p(v1 X,v1 Y). When we insert this into the second rule, we getp(v1 Y,v2 Z). Nowwehavetoinsert thisagaininto thesecond rule: v2 Zcontains theinput value forY,butmustalsobesetwithanewdatavaluefrom r. Inthiscase, somecopying seemsunavoidable. While thereare optimizations possible, thesimplest solution istocreate aC++variable foreachlogical variable of the rule, and to copy first the values from the input fact to the right variable (which might needtemporaryvariables, e.g.forswappingthevaluesoftwovariables). Forrecursiveruleapplications, theprevious variablevaluesarealsostoredonastack(seeSection5.6below). 5.5 Labels forCode Pieces Weneedagotolabeland/oracaseselectorvalue(auniquenumber)foreachcodepieceimplementing asymbolicruleapplication. Wewrite l_start(p(t ,...,t ), r , p(u ,...,u )) 1 n 1 n for the goto-label of thecode piece for application of rule r withbody literal p(u ,...,u )to the sym- 1 n bolicfactp(t ,...,t ). Ofcourse,insteadoflistingthebodyliteralp(u ,...,u )explicitly,onecouldalso 1 n 1 n use its position number in the rule r . In any case, the implementation will replace this by l_start_n withsomeuniquenumbern. Thesymbolicconstantforthecase-valueiswrittenasL_START(...)(and also made alegal C++identifier byusing the same unique number). Sometimes there are continuations orothercodepieces,therefore thelabelismarkedas“start”. 5.6 Protection ofVariableValues Ofcourse,whenacodepiececorrespondingtoasymbolicruleapplicationisexecuted,theC++variables in the symbolic fact p(t ,...,t ) must still have the same value as when this task was generated. It is 1 n possible that the ID/label of the code piece was pushed on the backtrack stack and it is executed only later. However, for every C++variable, a new value is assigned only incode pieces for the single rule for whichthevariablewasintroduced (toholdadatavalueforanEDBliteralinthatrule). 20 Bottom-UpEvaluationofDatalog:PreliminaryReport Furthermore,itisimportantthatthebacktrackpointsarekeptonastack. Sowewillreturntothatrule onlyafter allbacktrack points whichusethevalue(andarethusgenerated later)havebeenprocessed— unless the rule is recursive. In this case, the variable value must be saved (on another stack suitable for thedatatype),andweputtheIDofacodepieceonthebacktrackstackwhichrestoresthevariablevalue. This is done whenever we enter a recursive rule, and only for variables set in this rule (the derived fact mightcontain alsovariablespassedfromelsewhereandnotchangedintherule). Ifthebacktrackstackshrinksbelowthispoint,allusagesofthenewvariablevaluearedone,andthe oldvalueisrestored, sothatolderbacktrack pointsfindthevaluewhichwascurrentwhenthebacktrack pointwascreated. 6 Code Pieces In this section, we define a number of code pieces which are translations of different types of rules. Eachcodepiece corresponds toasymbolic ruleapplication. Forsimplicity, wedonotconsider variable conflicts(Section5.4)here. 6.1 IDB-Facts Suppose the program contains an IDB-fact p(c ,...,c ). For each body literal p(t ,...,t ) of a rule r 1 n 1 n thatunifieswiththefactp(c ,...,c ),thecaseselectorvalue 1 n L_START(p(c ,...,c ), r , p(t ,...,t )) 1 n 1 n ispushedonthebacktrack stackduringinitialization. 6.2 OneEDB-Body Literal Consider the rule p(t ,...,t )←r(u ,...,u ) where r is an EDB predicate. Let r be the rule number. 1 n 1 m Letp(t¯ ,...,t¯ )bethesymbolicfactgeneratedbytherule(t¯ :=t ift isaconstant,andt¯ :=vr _Xift 1 n i i i i i isthevariable X). Amongallpossiblecursorscursor_r_b forrchooseonesuchthatforallboundargumentpositionsi (i.e. b =b), u is a constant. This is always possible because every relation supports a full table scan, i i i.e.anaccesspathwithallargumentpositions“free”. Butobviously, ifthereareconstantsamongtheu, i and there are available indexes, it is best to choose one with the smallest estimated result size. In the declaration section, generate cursor_r_b cr ; Definesymbolic constants L_INIT_r andL_CONT_r asuniquenumbersforcasesintheswitch. Gener- atethefollowingcodeintheinitialization section: backtrack_stack.push(L_INIT_r ); Allfollowingcodeisgenerated intheswitch: 1. Generate case L_INIT_r : 2. Leti ,...,i betheboundargumentpositions inb . Generate: 1 k cr .open(u ,...,u ); i1 ik S.Brass&H.Stephan 21 (Notethatalthough another casefollows,execution simplycontinues.). 3. Generate: case L_CONT_r : Thefollowingloop(item4)isleftwithgotowhenthefirstfactisgenerated. Butbeforethejump, thiscaselabelispushedonthebacktrack stack, sothattheloopiscontinued later. 4. Generate while(cr .fetch()) { 5. Letu ,...,u betheconstants amongtheu ,...,u whichcorrespond tofreeargumentpositions i1 ik 1 m inb . Ifk≥1,generate if(cr .col_i () != u || ··· || cr .col_i () != u ) 1 i1 k ik continue; I.e. if the current tuple of the EDB-predicate does not have the required values for the constant arguments, weimmediatelystartthenextiteration ofthewhile-loop(i.e.fetchthenexttuple). 6. For every variable Y, which appears more than once among the u ,...,u : Let u ,...,u be all 1 m i1 ik equaltoY(notethatk≥2). Generateatestthatthesamevalueappearsinthesecolumns: if(cr .col_i () != cr .col_i || ··· || cr .col_i () != cr .col_i ()) 1 2 k−1 k continue; 7. For every variable X in the head let u be any occurrence of this variable among the u ,...,u . i j 1 m Because of the range restriction (allowedness) condition on the rules, X must occur in the body. i GenerateforeachX: i vr _X = cr .col_j(); i 8. In case the predicate p was selected for a duplicate check, the following must be done here: The resulttuplep(t¯ ,...,t¯ )withthecurrentvaluesoftheC++variablesisenteredintoahashtableor 1 n otherdatastructure. Ifthetuplewasalreadypresent, onesimplydoes“continue;”toskipit. 9. Generate: backtrack_stack.push(L_CONT_r ); This ensures that the while-loop above will be continued later. Since then the values of the variablesintroduced intherulewillchange,thislabelmustbeonthestackbeloweverytaskusing thegenerated tuple. 10. Let r ,...,r be all rules with an IDB body literal B, i:=1,...,k, which matches the generated 1 k i symbolicfactp(t¯ ,...,t¯ ). Fori:=2,...,k,generate 1 n backtrack_stack.push(L_START(p(t¯ ,...,t¯ ), r , B)); 1 n i i Finally,generate goto l_start(p(t¯ ,...,t¯ ), r , B ); 1 n 1 1 11. Generate } // End of while-loop break; Thebreak;isimportantifthewhile-loopendsbecauseno(further)matchingfactisfoundinthe relationr. Otherwise,theloopisleftwithgotowhenthefirst/nextmatchingfactisfound. 22 Bottom-UpEvaluationofDatalog:PreliminaryReport 6.3 Two EDB-Body Literals In the output of SLDMagic, this case does not occur. However, it is easy to extend the above program code. One uses two cursors, one for each body literal, and two nested while-loops. For simplicity, we implement all joins as “nested loop join” (or “index join” if the data structure for the relation supports the corresponding binding pattern). Later, sort orders might be used, so that also a “merge join” can be generated. 6.4 OneIDB-Body Literal Considertherule p(t ,...,t )←q(u ,...,u ), 1 n 1 m whereqisanIDB-predicate. Letr bethenumberofthisrule. Duetopartialevaluationdoneatcompile time, several specializations of thesamerule might begenerated. There isone code piece persymbolic fact q(u¯ ,...,u¯ ) which matches the body literal. Let q be a most general unifier, where variable-to- 1 m variable bindings are done such that logic variables are replaced by C++ variables i.e. u is replaced i by u¯ , if both are variables. The generated symbolic fact is p(t q ,...,t q ). Note that because of the i 1 n range restriction requirement, every variable among the t also appears as an u , and then it is unified i j withaconstantoraC++variable. Thus,nonewC++variablesareintroduced inthiscase. 1. Generate case L_START(q(u¯ ,...,u¯ ), r , q(u ,...,u )): 1 m 1 m l_start(q(u¯ ,...,u¯ ), r , q(u ,...,u )): 1 m 1 m Ifthisruleapplication isactivatedviabacktracking, thecaselabelisused. Ifitisactivatedasthe firstusageofageneratedfact,ajumptothegotolabelisdone(asaslightoptimizationofpushing somethingonthebacktrack stackandimmediatelypopping itagain). 2. Nowthepartoftheunificationwhichcanonlybedoneatruntimemustbegenerated. LetV ,...,V 1 k beallC++variables whichq replaces byconstants oradifferent variable (i.e.Vq 6=V). Ifk>0, i i generate: if(V != V q || ··· || V != V q ) 1 1 k k break; Sowesimply stop executing this code piece ifthe current fact forqdoes not unify withthe body literal. Thenanothertaskwillbetakenfromthebacktrack stackinthemainloop. 3. Next, if the predicate p was selected for a duplicate check, the code to enter the result tuple p(t q ,...,t q ) with the current values of the C++ variables into a hash table is generated here. 1 n If the tuple was already present (so wejust computed a duplicate), one simply does “break;” to endthecodepiece(asunder2above). 4. Let r ,...,r be all rules with an IDB body literal B, i:=1,...,k, which matches the generated 1 k i symbolicfactp(t q ,...,t q ). Fori:=2,...,k,generate 1 n backtrack_stack.push(L_START(p(t q ,...,t q ), r , B)); 1 n i i Finally,generate goto l_start(p(t q ,...,t q ), r , B ); 1 n 1 1

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.