ebook img

Parallel Evaluation of Multi-Semi-Joins PDF

12 Pages·2016·0.32 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Parallel Evaluation of Multi-Semi-Joins

Parallel Evaluation of Multi-Semi-Joins Jonny Daenen Frank Neven HasseltUniversity HasseltUniversity [email protected] [email protected] Tony Tan Stijn Vansummeren NationalTaiwanUniversity Universite´ LibredeBruxelles [email protected] [email protected] ABSTRACT determined by the total time, that is, the aggregate sum of time spent by all computing nodes. In this paper, we focus WhileservicessuchasAmazonAWSmakecomputingpower on parallel evaluation of queries that minimize total time abundantlyavailable,addingmorecomputingnodescanin- while retaining low net time. We consider parallel query cur high costs in, for instance, pay-as-you-go plans while plans that exhibit low net times and exploit commonalities notalwayssignificantlyimprovingthenetrunningtime(aka between queries to bring down the total time. wall-clock time) of queries. In this work, we provide algo- Semi-joins have played a fundamental role in minimiz- rithmsforparallelevaluationofSGFqueriesinMapReduce ing communication costs in traditional database systems that optimize total time, while retaining low net time. Not through their role in semi-join reducers [9,10], facilitating onlycanSGFqueriesspecifyallsemi-joinreducers,butalso thereductionofcommunicationinmulti-wayjoincomputa- more expressive queries involving disjunction and negation. tions. In more recent work, Afrati et al. [2] provide an al- Since SGF queries can be seen as Boolean combinations of gorithm for computing n-ary joins in MapReduce-style sys- (potentially nested) semi-joins, we introduce a novel multi- tems in which semi-join reducers play a central role. Moti- semi-join(MSJ)MapReduceoperatorthatenablestheeval- vatedbythegeneralimportanceofsemi-joins,westudythe uation of a set of semi-joins in one job. We use this op- system aspects of implementing semi-joins in a MapReduce erator to obtain parallel query plans for SGF queries that context. In particular, we introduce a multi-semi-join oper- outvalue sequential plans w.r.t. net time and provide addi- ator MSJ that enables the evaluation of a set of semi-joins tionaloptimizationsaimedatminimizingtotaltimewithout inoneMapreducejobwhilereducingresourceusageliketo- severelyaffectingnettime. Eventhoughthelatteroptimiza- taltimeandrequirementsonclustersizewithoutsacrificing tions are NP-hard, we present effective greedy algorithms. net time. We then use this operator to efficiently evalu- Ourexperiments, conducted using ourown implementation ate Strictly Guarded Fragment (SGF) queries [6,20]. Not GumboontopofHadoop,confirmtheusefulnessofparallel only can this query language specify all semi-join reducers, query plans, and the effectiveness and scalability of our op- but also more expressive queries involving disjunction and timizations,allwithasignificantimprovementoverPigand negation. Hive. Weillustrateourapproachbymeansofasimpleexample. Consider the following SGF query Q: 1. INTRODUCTION SELECT (x,y) FROM R(x,y) The problem of evaluating joins efficiently in massively WHERE S(x,y) OR S(y,x) AND T(x,z) parallel systems is an active area of research (e.g., [2–5,7, 8,14,19,25,28]). Here, efficiency can be measured in terms Intuitively,thisqu(cid:0)eryasksforallpa(cid:1)irs(x,y)inRforwhich of different criteria, including net time, total time, amount there exists some z such that (1) (x,y) or (y,x) occurs in of communication, resource requirements and the number S and (2) (x,z) occurs in T. To evaluate Q it suffices to of synchronization steps. As parallel systems aim to bring compute the following semi-joins down the net time, i.e., the difference between query end X := R(x,y)⋉S(x,y); and start time, it is often considered the most important X1 := R(x,y)⋉S(y,x); criterium. The amount of computing power is no longer X2 := R(x,y)⋉T(x,z); an issue through the readily availability of services such as 3 Amazon AWS. However, in pay-as-you-go plans, the cost is store the results in the binary relations X , X , or X , and 1 2 3 subsequentlycomputeϕ:=(X ∪X )∩X . Ourmulti-semi- 1 2 3 joinoperatorMSJ(S)(definedinSection4.2)takesanumber of semi-join-equations as input and exploits commonalities between them to optimize evaluation. In our framework, a This work is licensed under the Creative Commons Attribution possible query plan for query Q is of the form: NonCommercial NoDerivatives4.0InternationalLicense. Toviewacopy ofthislicense,visithttp://creativecommons.org/licenses/by nc nd/4.0/.For EVAL(R,ϕ) anyusebeyondthosecoveredbythislicense,obtainpermissionbyemailing [email protected]. ProceedingsoftheVLDBEndowment,Vol.9,No.10 MSJ(X ,X ) MSJ(X ) 1 2 3 Copyright2016VLDBEndowment2150 8097/16/06. 732 In this plan, the calculation of X and X is combined in ∧ operator and guarded existential quantifiers [31]. Flum 1 2 a single MapReduce job; X is calculated in a separate job; et al. [20] introduced the term strictly guarded fragment 3 and EVAL(R,ϕ) is a third job responsible for computing queries for queries of the form ∃y¯(α∧ϕ). That is, guarded the subset of R defined by ϕ. We provide a cost model fragmentquerieswithoutBooleancombinationsattheouter to determine the best query plan for SGF queries. We note level. Weconsideraslightgeneralizationofthesequeriesas that,unlikethesimplequeryQillustratedhere,SGFqueries explained in Remark 1. can be nested in general. In addition, we also show how In general, obtaining the optimal plan in SQL-like query to generalize the method to the simultaneous evaluation of evaluation,evenincentralizedcomputation,isahardprob- multiple SGF queries. lem[13,24]. ClassicworksbyYannakakisandBernsteinad- The contributions of this paper can be summarized as vocate the use of semi-join operations to optimize the eval- follows: uation of conjunctive queries [9,10,39]. A lot of work has 1. We introduce the multi-semi-join operator ⋉·(S) to been invested to optimize query evaluation in Pig [22,29], evaluateasetSofsemi-joinsandpresentacorrespond- Hive[34]andSparkSQL[38]aswellasinMapReducesetting ing MapReduce implementation MSJ(S). in general [12]. None of them target SGF queries directly. 2. We present query plans for basic, that is, unnested, Tao,LinandXiao[33]studiedminimalMapReducealgo- SGF queries and propose an improved version of the rithms, i.e. algorithms that scale linearly to the number of cost model presented by [27,36] for estimating their serversinallsignificantaspectsofparallelcomputationsuch cost. As computing the optimal plan for a given ba- as reduce compute time, bits of information received and sic SGF query is NP-hard, we provide a fast greedy sent,aswellasstoragespacerequiredbyeachserver. They heuristic Greedy-BSGF. show that, among many other problems, a single semi-join 3. Weshowthattheevaluationof(possiblynested)SGF querybetweentworelationscanbeevaluatedbyaoneround queries can be reduced to the evaluation of a set of minimal algorithm. This is a simpler problem, as a sin- basic SGF queries in an order consistent with the de- glebasicSGFquerymayinvolvemultiplesemi-joinqueries. pendencies induced by the former. In this way, com- Afrati et al. [2] introduced a generalization of Yannakakis’ puting an optimal plan for a given SGF query (which algorithm (using semi-joins) to a MapReduce setting. Note isNP-hardaswell)canbeapproximatedbyfirstdeter- that Yannakakis’ algorithm starts with a sequence of semi- mining an optimal subquery ordering, followed by an join operations, which is a (nested) SGF query in a very optimalevaluationofthesesubqueries. Fortheformer, restricted form. we present a greedy algorithm called Greedy-SGF. 4. WeexperimentallyassesstheeffectivenessofGreedy- 3. PRELIMINARIES BSGF and Greedy-SGF and obtain that, backed by Westartbyintroducingthenecessaryconceptsandtermi- an updated cost model, these algorithms successfully nologies. InSection3.1,wedefinethestrictlyguardedfrag- manage to bring down total times of parallel evalua- ment queries, while we discuss MapReduce in Section 3.2 tion,makingitcomparabletothatofsequentialquery and the cost model in Section 3.3. plans,whilestillretaininglownettimes. Thisisespe- ciallytrueinthepresenceofcommonalitiesamongthe 3.1 StrictlyGuardedFragmentQueries atomsofqueries. Finally,oursystemoutperformsPig In this section, we define the strictly guarded fragment andHiveinallaspectswhenitcomestoparalleleval- queries (SGF) [20], but use a non-standard, SQL-like nota- uation of SGF queries and displays interesting scaling tion for ease of readability. characteristics. We assume given a fixed infinite set D = {a,b,...} of Outline. This paper is organized as follows. We discuss data values and a fixed collection of relation symbols S = {R,S,...}, disjoint with D. Every relation symbol R ∈ S relatedworkinSection2. Weintroducethestrictlyguarded is associated with a natural number called the arity of R. fragment (SGF) queries and discuss MapReduce and the AnexpressionoftheformR(a¯)withRarelationsymbolof accompanying cost model in Section 3. In Section 4, we arity n and a¯∈Dn is called a fact. A database DB is then considertheevaluationofmulti-semi-joinsandSGFqueries. a finite set of facts. Hence, we write R(a¯) ∈ DB to denote In Section 5, we discuss the experimental validation. We that a tuple a¯ belongs to the R relation in DB. conclude in Section 6. We also assume given a fixed infinite set V = {x,y,...} ofvariables,disjointwithDandS. Aterm iseitheradata 2. RELATEDWORK value or a variable. An atom is an expression of the form Recallthatfirst-orderlogic(FO)queriesareequivalentin R(t1,...,tn) with R a relation symbol of arity n and each expressive power to the relational algebra (RA) and form of the ti a term, i ∈[1,n]. (Note that every fact is also an the core fragment of SQL queries (cf., e.g., [1]). Guarded- atom.) A basic strictly guarded fragment (BSGF) query (or fragment(GF)querieshavebeenstudiedextensivelybythe just a basic query for short) is an expression of the form logicians in 1990s and 2000s, and they emerged from the Z :=SELECT x¯ FROM R(t¯) [ WHERE C ]; (1) intensive efforts to obtain a syntactical classification of FO queries with decidable satisfiability problems. For more de- wherex¯isasequenceofvariablesthatalloccurintheatom tails, we refer the reader to the highly influential paper by R(t¯),andtheWHERECclauseisoptional. Ifitoccurs,Cmust Andreka, van Benthem, and Nemeti [6], as well as survey beaBooleancombinationofatoms. Furthermore,toensure papersbyGra¨delandVardi[23,35]. Intraditionaldatabase thatqueriesbelongtotheguardedfragment,werequirethat terms, GF queries are equivalent in expressive power to for each pair of distinct atoms S(u¯) and T(v¯) in C it must semi-join algebra [26]. Closely related are freely acyclic GF hold that all variables in u¯∩v¯ also occur in t¯. (See also queries, which are GF queries restricted to using only the Remark 1 below.) The atom R(t¯) is called the guard of 733 the query, while the atoms occurring in C are called the MapPhase ReducePhase conditional atoms. WeinterpretZ astheoutputrelationof z }| { z }| { read→map→sort→merge→trans.→merge→reduce→write the query. | {z }| {z }| {z } On a database DB, the BSGF query (1) defines a new Input IntermediateData Output relationZ containingalltuplesa¯forwhichthereisasubsti- tutionσ forthevariablesoccurringint¯suchthatσ(x¯)=a¯, Figure1: AdepictionoftheinnerworkingsofHadoopMR. R(σ(t¯))∈DB, and C evaluates to true in DB under substi- tution σ. Here, the evaluation of C in DB under σ is de- Remark 1. The syntax we use here differs from the tra- fined by recursion on the structure of C. If C is C OR C , 1 2 ditional syntax of the Guarded Fragment [20], and is ac- C AND C , or NOT C , the semantics is the usual boolean 1 2 1 tually closer in spirit to join trees for acyclic conjunctive interpretation. If C is an atom T(v¯) then C evaluates to true if σ(t¯) ∈ R(t¯)⋉T(v¯), i.e., if there exists a T-atom in queries [9,11], although we do allow disjunction and nega- DB that equals R(σ(t¯)) on those positions where R(t¯) and tion in the where clause. In the traditional syntax, a pro- jection in the guarded fragment is only allowed in the form T(v¯) share variables. ∃w¯R(x¯)∧ϕ(z¯)whereallvariablesinz¯mustoccurinx¯. One Example 1. The intersection Z := R ∩S and the dif- can obtain a query in the traditional syntax of the guarded 1 ference Z := R−S between two relations R and S are fragment from our syntax by adding extra projections for 2 expressed as follows: the atoms in C. For example, Z1 := SELECT x¯ FROM R(x¯) WHERE S(x¯); SELECT x FROM R(x,y) WHERE S(x,z1) AND NOT S(y,z2) Z2 := SELECT x¯ FROM R(x¯) WHERE NOT S(x¯); becomes ∃y(R(x,y)∧(∃z1)S(x,z1)∧¬(∃z2)S(y,z2)). We The semijoin Z = R(x¯,y¯)⋉S(y¯,z¯) and the antijoin Z = notethatthistransformationincreasesthenestingdepthof 3 4 R(x¯,y¯)✄S(y¯,z¯) are expressed as follows: the query. ✷ 3.2 MapReduce Z := SELECT x¯,y¯FROM R(x¯,y¯) WHERE S(y¯,z¯); 3 Z := SELECT x¯,y¯FROM R(x¯,y¯) WHERE NOT S(y¯,z¯); We briefly recall the Map/Reduce model of computation 4 (MRforshort),anditsexecutionintheopen-sourceHadoop The following BSGF query selects all the pairs (x,y) for framework [18,37]. An MR job is a pair (µ,ρ) of functions, which (x,y,4) occurs in R and either (1,x) or (y,10) is in where µ is called the map and ρ the reduce function. The S, but not both: execution of an MR job on an input dataset I proceeds in Z := SELECT (x,y) FROM R(x,y,4) two stages. In the first stage, called the map stage, each 5 fact f ∈ I is processed by µ, generating a collection µ(f) WHERE (S(1,x) AND NOT S(y,10)) of key-value pairs of the form hk:vi. The total collec- OR (NOT S(1,x) AND S(y,10)); tion µ(f)ofkey-valuepairsgeneratedduringthemap f∈I Finally,thetraditionalstarsemi-joinbetweenR(x1,...,xn) phaseSis then grouped on the key, resulting in a number of andrelationsSi(xi,yi),fori∈[1,n],isexpressedasfollows: groups, say hk1 :V1i,...,hkn :Vni where each Vi is a set of values. Eachgrouphki :Viiisthenprocessedbythereduce Z6 := SELECT (x1,...,xn) FROM R(x1,...,xn) function ρ resulting again in a collection of key-value pairs WHERE S(x1,y1) AND... AND S(xn,yn); ✷ per group. The total collection iρ(hki :Vii) is the output of the MR job. Astrictlyguardedfragment(SGF)query isacollectionof S An MR program is a directed acyclic graph of MR jobs, BSGFs of the form Z1 := ξ1;...;Zn := ξn; where each ξi where an edge from job (µ,ρ) → (µ′,ρ′) indicates that is a BSGF that can mention any of the predicates Zj with (µ′,ρ′) operates on the output of (µ,ρ). We refer to the j < i. On a database DB, the SGF query then defines a lengthofthelongestpathinanMRprogramasthenumber new relation Zn where every occurrence of Zi is defined by of rounds of the program. evaluating ξi. 3.3 CostModelforMapReduce Example 2. Let Amaz, BN, and BD be relations con- taining tuples (title, author, rating) corresponding to the As our aim is to reduce the total cost of parallel query books found at Amazon, Barnes and Noble, and Book De- plans, we need a cost model that estimates this metric for pository,respectively. LetUpcoming containtuples(new- a given MR job. We briefly touch upon a cost model for title, author) of upcoming books. The following query se- analyzing the I/O complexity of an MR job based on the lects all the upcoming books (newtitle, author) of authors one introduced in [27,36] but with a distinctive difference. that have not yet received a “bad” rating for the same title The adaptation we introduce, and that is elaborated upon at all three book retailers; Z is the output relation: below,takesintoaccountthatthemapfunctionmayhavea 2 different input/output ratio for different parts of the input Z := SELECT aut FROM Amaz(ttl,aut,”bad”) 1 data. WHERE BN(ttl,aut,”bad”) AND BD(ttl,aut,”bad”); While conceptually an MR job consists of only the map Z := SELECT (new,aut) FROM Upcoming(new,aut) andreducestage,itsinnerworkingsaremoreintricate. Fig- 2 ure 1 summarizes the steps in the execution of an MR job. WHERE NOT Z (aut); 1 See [37, Figure 7-4, Chapter 7] for more details. The map NotethatthisquerycannotbewrittenasabasicSGFquery, phase involves (i) applying the map function on the input; since the atoms in the query computing Z must share the (ii) sorting and merging the local key-value pairs produced 1 ttl variable, which is not present in the guard of the query by the map function, and (iii) writing the result to local computing Z . ✷ disk. 2 734 Let I1∪···∪Ik denote the partition of the input tuples lr local disk read cost (per MB) suchthatthemapperbehavesuniformly1oneverydataitem lw local disk write cost (per MB) in Ii. Let Ni be the size (in MB) of Ii, and let Mi be the hr hdfs read cost (per MB) size(inMB)oftheintermediatedataoutputbythemapper hw hdfs write cost (per MB) on Ii. The cost of the map phase on Ii is: t transfer cost (per MB) costmap(Ni,Mi) = hrNi+mergemap(Mi)+lwMi, Mi map output meta-data for Ii (in MB) mi number of mappers for Ii wheremergemap(Mi),denotingthecostofsortandmergein cr number of reducers the map stage, is expressed by D external sort merge factor mergemap(Mi) = (lr+lw)MilogD(cid:24)(Mbiu+fMcmia)/pmi(cid:25). bbuuffmreadp mreadpucteastkasbkubffueffrelrimliimti(tin(iMnBM)B) See Table 1 for the meaning of the variables hr, lw, lr, lw, Table 1: Description of constants used in the cost model. D,Mi,mi,andbufmap.2 Thetotalcostincurredinthemap phase equals the sum 4. EVALUATING MULTI-SEMI-JOIN AND c k SGFQUERIES costmap(Ni,Mi). (2) i=1 Inthissection,wedescribehowSGFqueriescanbeeval- X Note that the cost model in [27,36] defines the total cost uated. We start by introducing some necessary building incurred in the map phase as blocks in Sections 4.1 to 4.3, and describe the evaluation of BSGF queries and multiple BSGF queries in Section 4.4 k k and 4.5, respectively. These are then generalized to the full costmap Ni, Mi . (3) fragment of SGF queries in Section 4.6 and 4.7. ! Xi=1 Xi=1 First,weintroducesomeadditionalnotation. Wesaythat The latter is not always accurate. Indeed, consider for in- a tuple a¯=(a1,...,an)∈Dn of n data values conforms to stance an MR job whose input consists of two relations R a vector t¯=(t1,...,tn) of terms, if andSwherethemapfunctionoutputsmanykey-valuepairs 1. ∀i,j ∈[1,n], ti =tj implies ai =aj; and, for each tuple in R and at most one key-value pair for each 2. ∀i∈[1,n] if ti ∈D, then ti =ai. tuple in S, e.g., because of filtering. This difference in map For instance, (1,2,1,3) conforms to (x,2,x,y). Likewise, a outputmayleadtoanon-proportionalcontributionofboth factT(a¯)conformstoanatomU(t¯)ifT =U anda¯conforms input relations to the total cost. Hence, as shown by Equa- to t¯. We write T(a¯) |= U(t¯) to denote that T(a¯) conforms tion(2),weopttoconsiderdifferentinputsseparately. This to U(t¯). If f = R(a¯) is a fact conforming to an atom α = cannotbecapturedbymapcostcalculationofEquation(3), R(t¯) and x¯ is a sequence of variables that occur in t¯, then asitconsiderstheglobal averagemapoutputsizeinthecal- the projection πα;x¯(f) of f onto x¯ is the tuple ¯b obtained culation of the merge cost. In Section 5, we illustrate this by projecting a¯ on the coordinates in x¯. For instance, let problembymeansofanexperimentthatconfirmstheeffec- f =R(1,2,1,3) and α=R(x,y,x,z). Then, R(1,2,1,3)|= tiveness of the proposed adjustment. R(x,y,x,z) and hence πα;x,z(f)=(1,3). Toanalyzethecostinthereducephase,letM = ki=1Mi. 4.1 EvaluatingOneSemi-Join The reduce stage involves (i) transferring the intermediate P data (i.e., the output of the map function) to the correct Asawarm-up,letusexplainhowsinglesemi-joinscanbe reducer,(ii)mergingthekey-valuepairslocallyforeachre- evaluated in MR. A single semi-join is a query of the form ducer, (iii) applying the reduce function, and (iv) writing Z :=SELECT w¯ FROM α WHERE κ; (4) the output to hdfs. Its cost will be wherebothαandκareatoms. Fornotationalconvenience, costred(M,K) = tM +mergered(M)+hwK, we will denote this query simply by πw¯(α⋉κ). whereK isthesizeoftheoutputofthereducefunction(in Toevaluate(4),onecanusethefollowingoneroundrepar- MB). The cost of merging equals tition join [12]. The mapper distinguishes between guard facts(i.e.,factsinDBconformingtoα)andconditionalfacts mergered(M) =(lr+lw)MlogD(cid:24)buMf/rred(cid:25). (joi.ien.,kfeayc,tsi.ein.,DthBosceovnaforirambilnesgotcocκu)r.rinSgpeicnifibcoatlhlyα,leatndz¯κb.eFthoer The total cost of an MR job equals the sum each guard fact f such that f |= α, the mapper emits the key-value pair hπα;z¯(f):[Reqκ;Outπα;w¯(f)]i. Intuitively, k this pair is a “message” sent by guard fact f to request costh+ costmap(Ni,Mi)+costred(M,K), whether a conditional fact g |= κ with πκ;z¯(g) = πα;z¯(f) Xi=1 existsinthedatabase,statingthatifsuchaconditionalfact where costh is the overhead cost of starting an MR job. exists, the tuple πα;w¯(f) should be output. Conversely, for eachconditionalfactg|=κ,themapperemitsamessageof te1urapUtlneesifiontrhmIei sibsaemshueabvnjieoucumtrebdmerteoaontfhsketehsyaa-vmtaefluomreaeppvaeifrruysn.cItiIi,nonegaeacnhnedrinaglpe,unat- κth-ceofnofromrmhiπnκg;z¯f(agc)t:in[Atshseedrattκa]bia,saessweirtthinjgoitnhkeeeyxπisκt;ez¯n(gce).oOfna partition is a subset of an input relation. input ¯b:V ,thereduceroutputsalltuplesa¯ torelationZ 2InHadoop,eachtupleoutputbythemapfunctionrequires for which [Reqκ;Outa¯] ∈ V, provided that V contains at (cid:10) (cid:11) 16 bytes of metadata. least one assert message. 735 Example 3. ConsiderthequeryZ :=πx(R(x,z)⋉S(z,y)) Algorithm1MSJ(X1 :=πx¯1(α1⋉κ1),...,Xn :=πx¯n(αn⋉ and let I contain the facts {R(1,2),R(4,5),S(2,3)}. Then κn)) themapperemitskey-valuepairs h2:[ReqS(z,y);Out1]i, 1: function Map(Fact f) h5:[ReqS(z,y);Out4]i and, h2:[AssertS(z,y)]i, which 2: buff = [] afterreshufflingresultingroupsh5:{[ReqS(z,y);Out4]}i 3: for every i such that f |=αi do raenddu,che2r:p{r[oRceeqssSin(gz,tyh)e;Oseucton1]d,[AgrsosueprtpSro(dz,uyc)e]s}ia.nOonulytptuhte, 4: buff += hπαi;z¯i(f):[Req(κi,i);Outπαi;x¯i(f)]i namely the fact Z(1). ✷ 5: for every i such that f |=κi do 6: buff += hπκi;z¯i(f):[Assertκi]i 7: emit buffer Cost Analysis. To compare the cost of separate and com- bined evaluation of multiple semi-joins in the next section, 8: function Reduce(hk:Vi) wefirstillustratehowtoanalyzethecostofevaluatingasin- 9: for all [Reqκi;Outa¯] in V do glesemi-joinusingthecostmodeldescribedabove. Hereto, 10: if V contains [Assertκi] then let|α|and|κ|denotethetotalsizeofallfactsthatconform 11: add a¯ to Xi to α and κ, respectively. Five values are required for esti- mating the total cost: N ,N ,M ,M and K. We can now 1 2 1 2 choose M = |α| and M = |κ|. For simplicity, we assume of map and/or reduce tasks, which directly leads to an in- 1 2 that key-value pairs output by the mapper have the same creased net time. These trade-offs are made more apparent size as their corresponding input tuples, i.e., N =M and in the following analysis and are taken into account in the 1 1 N =M .3 Finally,theoutputsizeK canbeapproximated algorithm Greedy-BSGF introduced in Section 4.4. 2 2 by its upper bound N . Correct values for meta-data size 1 and number of mappers can be derived from the number of Cost Analysis. Let all κi’s be different atoms and α1 = input records and the system settings. ···=αn =α. Asimilaranalysiscanbeperformedforother comparablescenarios. Asbefore,weassumethatthesizeof 4.2 EvaluatingaCollectionofSemi-Joins the key-value pair is the same as the size of the conforming Since a BSGF query is essentially a Boolean combina- fact, and all tuples conform to their corresponding atom. tionofsemi-joins,itcanbecomputedbyfirstevaluatingall The cost of MSJ(S), denoted by cost(S), equals semi-joinsfollowedbytheevaluationoftheBooleancombi- nation. Inthepresentsection,weintroduceasingle-jobMR n program MSJ that evaluates a set of semi-joins in parallel. costh+costmap(|α|,n|α|)+ costmap(|κi|,|κi|) Inthenextsectionweintroducethesingle-jobMRprogram Xi=1 EVAL to evaluate the Boolean combination. n n Weintroduceaunarymulti-semi-joinoperator⋉·(S)that +costred n|α|+ |κi|, |Xi| , (5) takes as input a set of equations S = {X1 := πx¯1(α1 ⋉ (cid:16) Xi=1 Xi=1 (cid:17) κ1),...,Xn := πx¯n(αn ⋉κn)}. It is required that the Xi where |Xi| is the size of the output relation Xi. If we eval- are all pairwise distinct and that they do not occur in any uate each Xi in a separate MR job, the total cost is: of the right-hand sides. The semantics is straightforward: n sthtoeroespetrhaetorerscuolmt ipnuttheseecvoerrryesspeomnid-jionignoπux¯tip(αuti⋉reκlait)ioinnSXai.nd i=1(cid:18)+cosctohs+tredco(|sαtm|+ap(||καi||,,||αX|i)|)+costmap(|κi|,|κi|)(cid:19) (6) X WenowexpandtheMRjobdescribedinSection4.1into a job that computes ⋉·(S) by evaluating all semi-joins in So, single-job evaluation of all Xi’s is more efficient than separateevaluationiffEquation(5)islessthanEquation(6). parallel. Let z¯i be the join key of semi-join πx¯i(αi ⋉κi). Algorithm1showsthesingleMRjobMSJ(S)thatevaluates 4.3 EvaluatingBooleanCombinations allnsemi-joinsatonce. Morespecifically,MSJsimulatesthe LetX0,X1,...Xnberelationswiththesamearityandlet repartitionjoinofSection4.1,butoutputsrequestmessages ϕbeaBooleanformulaoverX1,...Xn. Itisstraightforward for all the guard facts at once (i.e., those facts conforming to evaluate X0∧ϕ in a single MR job: on each fact Xi(a¯), tooneoftheαi fori∈[1,n]). Similarly,assertmessagesare the mapper emits ha¯:ii. The reducer hence receives pairs generatedsimultaneouslyforalloftheconditionalfacts(i.e., ha¯:ViwithV containingalltheindicesiforwhicha¯∈Xi, those facts conforming to one of the κi for i ∈ [1,n]). The and outputs a¯ only if the Boolean formula, obtained from Trehdautceisr,tohneninrpeucotnc¯bil:eVsth,ethmeersesadguecsercoonuctepruntisngthtehetuspalmeea¯κtio. X0∧ϕ by replacing every Xi with true if i ∈ V and false otherwise,evaluatestotrue. Forinstance,ifϕ=X ∧X ∧ 1 2 relationXi forwh(cid:10)ich[R(cid:11)eq(κ,i);Outa¯]∈V,providedthat ¬X3, it will emit a¯ only if V contains 0, 1 and 2 but not 3. V containsatleastonemessageoftheform[Assertκ]. The We denote this MR job as EVAL(X ,ϕ). We emphasize 0 oeaucthpuXtithcoenretfaoirneincgonthsiestrsesoufltthoef ervelaaltuiaotnisngXπ1x¯,i.(.α.,iX⋉nκ,i)w.ith tdhisattinmctulsteitpsleoBf ovaorleiaabnlefsorcmaunlabseYe1va∧luϕa1t,e.d..i,nYonn∧eϕMnRwjiothb a sCinogmlebinMinSgJtjhoebeavvaoluidastiothneofovaecrohleleacdtioofnsotfasretminig-jominusltiinptloe which we denote as EVAL(Y1,ϕ1,...,Yn,ϕn). jobs, reads every input relation only once, and can reduce the amount of communication by packing similar messages Cost Analysis. Let |Xi| be the size of relation Xi. Then, when|ϕ|isthesizeoftheoutput,cost(EVAL(X ,ϕ))equals together (cf. Section 5.1). At the same time, grouping all 0 semi-joinstogethercanpotentiallyincreasetheaverageload n n 3Gumbo uses sampling to estimate Mi (cf. Section 5.1). costh+i=0costmap(|Xi|,|Xi|)+costred i=0|Xi|,|ϕ| . (7) X (cid:0)X (cid:1) 736 EVAL(R,Z) ComputingtheOptimalPartition. ByBSGF-Optwede- note the problem that takes a BSGF query Q as above and computes a partition S1∪···∪Sp of S such that its total MSJ(X1) MSJ(X2) MSJ(X3) cost as computed in Equation (9) is minimal. The Scan- Shared Optimal Grouping problem, which is known to (a) be NP-hard, is reducible to this problem [27]: EVAL(R,Z) EVAL(R,Z) Theorem 1. Thedecisionvariantof BSGF-OptisNP- complete. MSJ(X1,X3) MSJ(X2) MSJ(X1,X2,X3) Whileforsmallqueriestheoptimalsolutioncanbefound (b) (c) using a brute-force search, for larger queries we adopt the fastgreedyheuristicintroducedbyWangetal[36]. Fortwo Figure 2: Different possible query plans for the query given disjoint subsets Si,Sj ⊆S, define: inExample4. Here,X :=R(x,y)⋉S(x,z),X :=R(x,y)⋉ 1 2 T(y),X :=R(x,y)⋉U(x)andZ :=X ∧(X ∨¬X );trivial gain(Si,Sj)=cost(Si)+cost(Sj)−cost(Si∪Sj). 3 1 2 3 projections are omitted. That is, gain(Si,Sj) denotes the cost gained by evaluating Si∪Sj in one MR job rather than evaluating each of them 4.4 EvaluatingBSGFQueries separately. For a partition S1∪···∪Sp, our heuristic algo- rithmgreedilyfindsapairi,j ∈[p]×[p]suchthati6=j and Wenowhavethebuildingblockstodiscusstheevaluation gain(Si,Sj) > 0 is the greatest. If there is such a pair i,j, of basic queries. Consider the following basic query Q: we merge Si and Sj into one set. We iterate such heuristic Z :=SELECT w¯ FROM R(t¯) WHERE C. Sstia=rti{nXgiw:=ithπtw¯h(eRt(rt¯i)v⋉iaκlip)}a.rtTithioenalSg1or∪it·h·m·∪stSonp,swwhheernetehaecrhe Here, C is a Boolean combination of conditional atoms κi, is no pair i,j for which gain(Si,Sj) > 0. We refer to this algorithm as Greedy-BSGF. For a BSGF query Q, we de- for i ∈ [1,n], that can only share variables occurring in t¯. Note that it is implicit that κ1,...,κn are all different fnoorteQb,yanOdPTby(QG)OtPheTo(Qpt)imwaeld(leenaosttectohste)pbraosgircaMmRcopmropgurtaemd atoms. Furthermore, let S be the set of equations {X := 1 πw¯(R(t¯)⋉κ1),...,Xn := πw¯(R(t¯)⋉κn)} and let ϕC be by Greedy-BSGF. the Boolean formula obtained from C by replacing every 4.5 EvaluatingMultipleBSGFQueries conditional atom κi by Xi. The approach presented in the previous section can be Then, for every partition {S1,...,Sp} of S, the following readilyadaptedtoevaluatemultipleBSGFqueries. Indeed, MR program computes Q: consider a set of n BSGF queries, each of the form EVAL(R,ϕC) Zi :=SELECT w¯i FROM Ri(t¯i) WHERE Ci MSJ(S1) ... MSJ(Sp) swphoenrdeinngonMeRofptrhoegrCami ciasnthreefneroftothaenfyoromf the Zj. A corre- WerefertoanysuchprogramasabasicMRprogramforQ. EVAL(R1,ϕC1,...,Rn,ϕCn) NoticethatallMSJjobscanbeexecutedinparallel. So,the aboveprogramconsistsinfactoftworounds,butnotethat therearep+1MRjobsintotal: oneforeachMSJ(Si),and MSJ(S1) ... MSJ(Sp) one for EVAL(R,ϕC). The child nodes constitute a partition of all the necessary Example 4. Figure 2 shows three alternative basic MR semi-joins. Again,ϕCi istheBooleanformulaobtainedfrom programs for the following query: Ci. WeassumethatthesetofvariablesusedintheBoolean formulasaredisjoint. ForasetofBSGFqueriesF,werefer Z := SELECT x,y FROM R(x,y) toanyMRprogramoftheaboveformasabasicMRprogram WHERE S(x,z) AND (T(y) OR NOT U(x)) (8) for F, whose cost can be computed in a similar manner as above. The optimal basic program for F and the program In alternative (a), all semijoins X1,X2,X3 are evaluated as computedbythegreedyalgorithmofSection4.4aredenoted separate jobs. In alternative (b), X1 and X3 are computed by OPT(F) and GOPT(F), respectively, and their costs are in one job, while X2 is computed separately. In alternative denoted by cost(OPT(F)) and cost(GOPT(F)). (c),allsemijoinsX ,X ,X arecomputedinasinglejob.✷ 1 2 3 4.6 EvaluatingSGFQueries Next, we turn to the evaluation of SGF queries. Recall Cost Analysis. When S is partitioned into S1∪···∪Sp, that an SGF query Q is a sequence of basic queries of the the cost of the MR program is: form Z1 :=ξ1;...;Zn :=ξn; where each ξi can refer to the relationsZj withj <i. WedenotetheBSGFZi :=ξibyQi. p ThemostnaivewaytocomputeQistoevaluatetheBSGF cost(EVAL(R,ϕC))+ cost(Si), (9) queries in Q sequentially, where each ξi is evaluated using i=1 X theapproachdetailedintheprevioussection. Thisleadsto where the cost of cost(Si) is as in Equation (5). a 2n-round MR program. We would like to have a better 737 strategythataimsatdecreasingthetotaltimebycombining 1. Suppose X =(F1,...,Fm) and blue vertices remain. the evaluation of different independent subqueries. 2. LetDbethesetofthoseblueverticesinGQ forwhich To this end, let GQ be the dependency graph induced by none of the incoming edges are from other blue ver- Q. That is, GQ consists of a set F of n nodes (one for each tices. (Due to the acyclicity of GQ, the set D is non- BSGFquery)andthereisanedgefromQi toQj ifrelation empty if GQ still has blue vertices.) Zi is mentioned in ξj. A multiway topological sort of the 3. Find a pair (u,Fi) such that u ∈ D, (F1,...,Fi ∪ dependency graph GQ is a sequence (F1,...,Fk) such that {u},...,Fm)isatopologicalsortofthevertices{u}∪ 1. {F1,...,Fk} is a partition of F; iFi, and overlap(u,Fi) is non-zero. 2. if there is an edge from node u to node v in GQ, then 4. If such a pair (u,Fi) exists, choose one with maximal S u∈Fi and v∈Fj such that i<j. overlap(u,Fi),andsetX =(F1,...,Fi∪{u},...,Fm). Noticethatanymultiwaytopologicalsort(F1,...,Fk)ofGQ Otherwise, set X =(F1,...,Fm,{u}). provides a valid ordering to evaluate Q, i.e., all the queries 5. Color the vertex u red. in Fi are evaluated before Fj whenever i<j. The iteration stops when every vertex in GQ is red, and Example 5. Let us illustrate the latter by means of an hence, X is a multiway topological sort of GQ. Clearly, the numberofiterationsisn,wherenisthenumberofvertices example. Consider the following SGF query Q: in GQ. Each iteration takes O(n2). Therefore, the heuristic Q : Z := SELECT x,y FROM R (x,y) WHERE S(x) algorithm outlined above runs in O(n3) time. 1 1 1 Notethatanaivedynamicevaluationstrategymayconsist Q : Z := SELECT x,y FROM Z (x,y) WHERE T(x) 2 2 1 of re-running Greedy-SGF after each BSGF evaluation in Q : Z := SELECT x,y FROM Z (x,y) WHERE U(x) 3 3 2 order to obtain an updated MR query plan. Q : Z := SELECT x,y FROM R (x,y) WHERE T(x) 4 4 2 4.7 EvaluatingMultipleSGFQueries Q : Z := SELECT x,y FROM Z (x,y) WHERE Z (x,x) 5 5 3 4 EvaluatingacollectionofSGFqueriescanbedoneinthe The dependency graph GQ is as follows: same way as evaluating one SGF query. Indeed, we can simply consider the union of all BSGF subqueries. Note Q Q Q 3 2 1 thatthisstrategycanexploitoverlapbetweendifferentsub- Q 5 queries,potentiallybringingdownthetotaland/ornettime. Q 4 There are four possible multiway topological sorts of GQ: 5. EXPERIMENTALVALIDATION 1. ({Q ,Q }, {Q }, {Q }, {Q }). 1 4 2 3 5 In this section, we experimentally validate the effective- 2. ({Q }, {Q ,Q }, {Q }, {Q }). 1 2 4 3 5 ness of our algorithms. First, we discuss our experimental 3. ({Q }, {Q }, {Q ,Q }, {Q }). 1 2 3 4 5 setup in Section 5.1. In Section 5.2, we discuss the evalua- 4. ({Q }, {Q }, {Q }, {Q },{Q }). ✷ 1 2 3 4 5 tion of BSGF queries. In particular, we compare with Pig Let F = (F1,...,Fk) be a topological sort of GQ. Since and Hive and address the effectiveness of the cost model. the optimal program OPT(Fi), defined in Subsection 4.5, The experiments concerning nested SGF queries are pre- is intractable (due to Theorem 1), we will use the greedy sented in Section 5.3. Finally, Section 5.4 discusses the approach to evaluate Fi, i.e., GOPT(Fi) as defined in Sec- overal performance of our own system called Gumbo. tion 4.5. The cost of evaluating Q according to F is 5.1 ExperimentalSetup k The algorithms Greedy-BSGF and Greedy-SGF are cost(F) = cost(GOPT(Fi)) (10) implemented in a system called Gumbo [15,17]. Gumbo Xi=1 runs on top of Hadoop, and adopts several important opti- WedefinetheoptimizationproblemSGF-Optthattakes mizations: asinputanSGFqueryQandconstructsamultiwaytopolog- (1) Message packing, as also used in [36], reduces network ical sort F of GQ with minimal cost(F). By reduction from communication by packing all the request and assert Subset Sum [21] we obtain the following result (cf. [16]): messages associated with the same key into one list. (2) Emitting a reference to each guard tuple (i.e., a tuple Theorem 2. The decision variant of SGF-Opt is NP- id)ratherthanthetupleitselfwhenevaluating(B)SGF complete. queries significantly reduces the number of bytes that In the following, we present a novel heuristic for com- areshuffled. Tocompensateforthisreduction,theguard puting a multiway topological sort of an SGF that tries to relationneedstobere-readintheEVALjobbutthelat- maximize the overlap between queries. To this end, we de- ter is insignificant w.r.t. the gained improvement. finetheoverlapbetweenaBSGFqueryQandasetofBSGF (3) Setting the number of reducers in function of the inter- queries F, denoted by overlap(Q,F), to be the number of mediate data size. An estimate of the intermediate size relations occurring in Q that also occur in F. For instance, is obtained through simulation of the map function on inExample5,theoverlapbetweenQ and{Q ,Q ,Q ,Q } a sample of the input relations. The latter estimates 2 1 3 4 5 is 1 as they share only relation T. are also used as approximate values for Ninp, Nint, and ConsiderthefollowingalgorithmGreedy-SGFthatcom- Nout. For the experiments below, 256MB of data was putes a multiway topological sort F of an SGF query Q. allocated to each reducer. Initially, all the vertices in the dependency graph GQ are (4) When the conditional atoms of a BSGF query all have colored blue and X = (). The algorithm performs the fol- thesamejoin-key,thequerycanbeevaluatedinonejob lowing iteration with the invariant that X is a multiway bycombiningMSJandEVAL.Asimilarreductiontoone topological sort of the red vertices in G: job can be obtained when the the Boolean condition is 738 QID Query Typeofquery Sequential vs. Parallel. We first compare sequential and A1 R(x,y,z,w)⋉ guardsharing parallel evaluation of queries A1–A5 to highlight the major S(x)∧T(y)∧U(z)∧V(w) A2 R(x,y,z,w)⋉ guard & con- differences between sequential and parallel query plans and S(x)∧S(y)∧S(z)∧S(w) ditional name to illustrate the effect of grouping. In particular, we con- sharing siderthreeevaluationstrategiesinGumbo: (i)evaluatingall A3 R(x,y,z,w)⋉ guard & condi- S(x)∧T(x)∧U(x)∧V(x) tional key shar- semi-joinssequentiallybyapplyingasemi-jointotheoutput ing of the previous stage (SEQ), where the number of rounds A4 R(x,y,z,w)⋉ nosharing dependsonthenumberofsemi-joins;(ii)usingthe2-round S(x)∧T(y)∧U(z)∧V(w) strategy with algorithm Greedy-BSGF (GREEDY); and, G(x,y,z,w)⋉ W(x)∧X(y)∧Y(z)∧Z(w) (iii) a more naive version of GREEDY where no grouping A5 R(x,y,z,w)⋉ conditional occurs,i.e.,everysemi-joinisevaluatedseparatelyinparal- S(x)∧T(y)∧U(z)∧V(w) namesharing lel(PAR).Assemi-joinalgorithmsinMRhavenotreceived G(x,y,z,w)⋉ S(x)∧T(y)∧U(z)∧V(w) significant attention, we choose to compare with the two B1 R(x,y,z,w)⋉ large conjunc- extreme approaches: no parallelization (SEQ) and paral- S(x)∧T(x)∧U(x)∧V(x)∧ tivequery lelization without grouping (PAR). Relative improvements S(y)∧T(y)∧U(y)∧V(y)∧ of PARandGREEDY w.r.t.SEQareshowninFigure3b. S(z)∧T(z)∧U(z)∧V(z)∧ S(w)∧T(w)∧U(w)∧V(w) We find that both PAR and GREEDY result in lower B2 R(x,y,z,w)⋉ uniqueness net times. In particular, we see average improvements of (S(x)∧¬T(x)∧¬U(x)∧¬V(x))∨ query 39% and 31% over SEQ, respectively. On the other hand, (¬S(x)∧T(x)∧¬U(x)∧¬V(x))∨ (S(x)∧¬T(x)∧U(x)∧¬V(x))∨ the total times for PAR are much higher than for SEQ: (¬S(x)∧¬T(x)∧¬U(x)∧V(x)) 132% higher on average. This is explained by the increase in both input and communication bytes, whereas the data Table 2: Queries used in the BSGF-experiment size can be reduced after each step in the sequential eval- uation. For GREEDY, total times vary depending on the structureofthequery. Totaltimesaresignificantlyreduced restricted to only disjunction and negation. The same for queries where conditional atoms share join keys and/or optimizationalsoworksformultipleBSGFqueries. We relation names. This effect is most obvious for queries A1, refer to these programs as 1-ROUND below. A2 and A5 where we oberve reductions in net time of 30%, All experiments are conducted on the HPC infrastruc- 29% and 30%, respectively, w.r.t. PAR. ture of the Flemish Supercomputer Center (VSC). Each For query A3, all conditional atoms have the same join experiment was run on a cluster consisting of 10 compute key, making 1-round (1-ROUND, see Section 5.1) evalua- nodes. Each node features two 10-core “Ivy Bridge” Xeon tionpossible. Thisfurtherreducesthetotalandnettimeto E5-2680v2CPUs(2.8GHz,25MBlevel3cache)with64GB only 49% and 63% of those of PAR, respectively. ofRAMandasingle250GBharddisk. Thenodesarelinked to a IB-QDR Infiniband network. We used Hadoop 2.6.2, Hive&Pig. We now examine parallel query evaluation in Pig 0.15.0 and Hive 1.2.1; the specific Hadoop settings and Pig and Hive and show that Gumbo outperforms both sys- cost model constants can be found in [16]. All experiments tems for BSGF queries. For this test, we implement the are run three times; average results are reported. 2-roundqueryplansofSection4.4directlyinPigandHive. Queriestypicallycontainamultitudeofrelationsandthe For Hive, we consider two evaluation strategies: one us- input sizes of our experiments go up to 100GB depending ingHive’sleft-outer-joinoperations(HPAR)andoneusing on the query and the evaluation strategy. The data that Hive’s semi-join operations (HPARS). For Pig, we consider is used for the guard relations consists of 100M tuples that onestrategythatisimplementedusingtheCOGROUPop- add up to 4GB per relation. For the conditional relations eration (PPAR). We also studied sequential evaluation of we use the same number of tuples that add up to 1GB per BSGF queries in both systems but choose to omit the re- relation; 50% of the conditional tuples match those of the sults here as both performed drastically worse than their guard relation. Gumbo equivalent (SEQ) in terms of net and total time. We use the following performance metrics: First, we find that HPAR lacks parallelization. This is 1. total time: the aggregate sum of time spent by all causedbyHive’srestrictionthatcertainjoinoperationsare mappers and reducers; executed sequentially, even when parallel execution is en- 2. net time: elapsed time between query submission to abled. This leads to net times that are 238% higher on obtaining the final result; average, compared to PAR. Note that query A3 shows a 3. input cost: the number of bytes read from hdfs over better net time than the other queries. This is caused by the entire MR plan; Hive allowing grouping on certain join queries, effectively 4. communication cost: the number of bytes that are bringing the number of jobs (and rounds) down to 2. transferred from mappers to reducers. Next, we find that HPARS performs better than HPAR in terms of net time but is still 126% higher on average 5.2 BSGFQueries thanPAR.Thelowernet timesw.r.t.HPARareexplained by Hive allowing parallel execution of semi-join operations, Table 2 lists the type of BSGF queries used in this sec- without allowing any form of grouping. This effectively tion.4 Figures3&4showtheresultsthatarediscussednext. makes HPAR the Hive equivalent of PAR. The high net times are caused by Hive’s higher average map and reduce 4 The results obtained here generalize to non-conjunctive input sizes. BSGFqueries. ConjunctiveBSGFquerieswerechosenhere Finally, Pig shows an average net time increase of 254%. to simplify the comparison with sequential query plans. 739 Net Time (s)1236900000000 233 137SPEA 140QR 562 323 506GRHEEPDA 240RY 129 236 583 322 589HPPAPRARS 234 159 1301-R 303O 323UN 472D 101 285 179 173 594 328 539 248 156 183 1153 1179 587 Net Time 123456000000000000%%%%%% 100% 59%SPEA 60%QR 241% 139% 217%GRHEEPDA 100%RY 54% 98% 243% 134% 245%HPPAPRARS 100% 68% 56%1-R 129%O 138%UN 201%D 43% 100% 63% 60% 208% 115% 189% 63% 74% 465% 476% 237% Total Time (s) 123000kkk 3k 7k 6k 6k 9k 14k 3k 7k 5k 6k 9k 17k 3k 7k 4k 4k 9k 13k 3k 7k 14k 13k 6k 9k 28k 5k 14k 10k 11k 34k 31k Total Time 246800000000%%%% 100% 253% 238% 225% 324% 540% 100% 242% 170% 218% 317% 614% 100% 245% 149% 142% 319% 490% 120% 100% 191% 168% 78% 117% 373% 100% 265% 186% 211% 640% 576% Input (GB) 1 25705050 12 28 20 28 42 40 12 28 13 28 42 50 12 28 14 14 42 40 8 23 55 30 32 42 79 19 55 26 57 83 102 Input 246000000%%% 100% 240% 171% 246% 361% 344% 100% 240% 110% 246% 361% 433% 100% 240% 120% 124% 361% 344% 70% 100% 240% 132% 140% 181% 344% 100% 291% 139% 298% 437% 535% Communication (GB) 1 25705050 16 22 22 33 53 50 16 22 18 33 53 50 16 22 15 20 53 50 12 33 44 39 33 53 99 27 44 33 66 105 99 Communication 123450000000000%%%%% 100% 133% 133% 202% 322% 304% 100% 133% 108% 202% 322% 304% 100% 133% 90% 122% 322% 304% 70% 100% 133% 120% 101% 161% 304% 100% 161% 124% 244% 389% 367% A1 A2 A3 A4 A5 A1 A2 A3 A4 A5 (a)Absolutevalues. (b)ValuesrelativetoSEQ. Figure 3: Results for evaluating the BSGF queries using different strategies. Thisismainlycausedbythelackofreductioninintermedi- CostModel. AsexplainedinSection3.3,themajordiffer- ate data and in input bytes, together with input-based re- ence between our cost model and that of Wang et al. [36] ducer allocation (1GB of map input data per reducer). For (referredtoascostgumbo andcostwang,respectively,fromhere these queries, this leads to a low number of reducers, caus- onward) concerns identifying the individual map cost con- ing the average reduce time, and hence overall net time, to tributionsoftheinputrelations. Forquerieswherethemap go up. input/outputratiodiffersgreatlyamongtheinputrelations, As the reported net times for Hive and Pig are much we notice a vast improvement for the GREEDY strategy. higher than for sequential evaluation in Gumbo (SEQ), we We illustrate this using the following query: conclude that Pig and Hive, with default settings, are unfit R(x,y,z,w)⋉S (x¯ ,c)∧...∧S (x¯ ,c)∧ for parallel evaluation of BSGF queries. For this reason we 1 1 1 12 restrict our attention to Gumbo in the following sections. S2(x¯1,c)∧...∧S2(x¯12,c)∧ S (x¯ ,c)∧...∧S (x¯ ,c)∧ 3 1 3 12 Large Queries. Next, we compare the evaluation of two S (x¯ ,c)∧...∧S (x¯ ,c), 4 1 4 12 larger BSGF queries B1 and B2 from Table 2. The results are shown in Figure 4. Query B1 is a conjunctive BSGF where x¯1,...,x¯12 are all distinct keys and c is a constant query featuring a high number of atoms. Its structure en- that filters out all tuples from S1,...,S4. The results for sures a deep sequential plan that results in a high net time evaluating this query using GREEDY with costgumbo and forSEQ.WefindthatPARonlytakes22%ofthenettime, costwang are unmistakable: costgumbo provides a 43% reduc- which shows that parallel query plans can yield significant tion in total time and a 71% reduction in net time. The improvements. Conversely, PAR takes up 261% more total explanation is that costwang does not discriminate between timethanSEQ,asthelatterismoreefficientinpruningthe different input relations, it averages out the intermediate data at each step. Here, GREEDY is able to successfully data and therefore fails to account for the high number of parallelize query execution without sacrificing total time. map-sidemergesandtheaccompanyingincreaseinbothto- Indeed, GREEDY exhibits a net time comparable to that tal and net time. of PAR and a total time comparable to that of SEQ. ForqueriesA1–A5andB1–B2,whereinputrelationshave Query B2 consists of a large boolean combination and is a contribution to map output that is proportional to their called the uniqueness query. This query returns the tuples input size, we find that both cost models behave similarly. that can be connected to precisely one of the conditional When comparing two random jobs, the cost models cor- relationsthroughagivenattribute. Thenumberofdistinct rectlyidentifythehighestcostjobin72.28%(costgumbo)and conditionalatomsislimited,andthedisjunctionatthehigh- 69.37%ofthecases(costwang). Hence,wefindthatcostgumbo est level makes it possible to evaluate the four conjunctive provides a more robust cost estimation as it can isolate in- subexpressionsinparallelusingSEQ.Still,wefindthatthe put relations that have a non-proportional contribution to nettimeofof PARimprovesthatof SEQby66%. AsPAR the map output, while it automatically resorts to costwang only needs to calculate the result of four semi-join queries in the case of an equal contribution. in its first round, we also find a reduction of 57% in total time. GREEDY further reduces both numbers. Conclusion. We conclude that parallel evaluation effec- Finally,forB2,a1-roundevaluation(1-ROUND,seeSec- tively lowers net times, at the cost of higher total times. tion 5.1) can be considered, as only one key is used for the GREEDY, backed by an updated cost model, successfully conditional atoms. This evaluation strategy brings down managestobringdowntotaltimesofparallelevaluation,es- both net and total time of SEQ by more than 80%. pecially in the presence of commonalities among the atoms of BSGF queries. For larger queries, total times similar to 740 Net Time (s)Total Time (s)Input (GB)Communication (GB)11121122 2369 24605005055500000000000000000000kkk 9877k2435 21726k9972 1738k1927B1 82461k180217 58035k137177 43240k166113GRHESPPEEAADQRRY 36315k5479 1617k28221 1324k1415-BRHO2 2499k4353PPUAP 3329k4253NRARDS 47513k4050 653k812 1011147135791357 05050000000000000000000000000000%%%%%%%%%%%%%%%% 100% 100% 100% 100% 22% 361% 411% 206% 17% 106% 80% 76%B1 83% 844% 749% 620% 59% 479% 570% 505% 44% 560% 653% 323% 100% 100% 100% 100% 44% 43% 51% 28% 36% 27% 25% 18%B2 69% 61% 80% 67% 92% 58% 77% 66% 131% 87% 73% 63% 18% 18% 18% 18% ZZ21Z1Z(1Z4Zz3((1(z)1x¯(x()x¯):x)=)::)==:=:=G:=GRG(R(xR¯((x¯(x¯)x¯(x¯))x)¯⋉)⋉)⋉⋉⋉⋉ZZZSS11S11(((((x(xxzxx))))))∧∧Z∧∨Z∧∧2SZ3TZS21U(1(1(((y(z(yyz(y)w)y))))))::==Z(ZZ(HZI5ba21(2((x(2¯x(¯x()¯x)x(¯)))z)c):⋉=)Q⋉Q::)==:Z=HUuGuQG2((R2((xee¯yx(¯xi¯u)(nx)rr))x¯v⋉)∨eyy⋉⋉Z)is∧rZ⋉V2TTy32SST(((((Txxyxz(ee))x(C)))tty∧∧)∧∧:)=3∧TTCZCZ((R2V1yy(122Z()y()(x¯y)1x)3))Z(⋉ZzZ4Z)3(6U3x((:x(=x)¯(x¯)x):)=:I):==:(=∧Hx¯HR)HT(x((¯⋉(xx(¯¯)yx¯))⋉¬))⋉⋉S⋉∧ZZU(3UVw3((z(x(()xx)z)))∨)∧∧∧∧ZUZU3Z((3y(w1(y)3y))()w) Absolute values Values relative to SEQ Z21(x¯):=H(x¯)⋉Z11(x)∨Z12(y)∨Z23(z)∨Z24(w) Figure 4: Results for large BSGF queries. Z12(y):=R(x¯)⋉U(z)∨S(x) Z14(y):=G(x¯)⋉S(z)∨U(x) Z11(y):=R(x¯)⋉S(x)∨T(y) Z13(y):=G(x¯)⋉U(x)∨V(y) 175% SEQ-UNIT PAR-UNIT GREEDY-SGF Net Time 111 025257050505%%%%%% 100% 31% 56% 100% 51% 71% 100% 73% 78% 100% 32% 42% (d)QueryC4 Figure 6: The queries used in the SGF experiment. Each Total Time 111 0255705005%%%%% 100% 107% 58% 100% 121% 74% 100% 108% 92% 100% 67% 57% node represents one BSGF subquery (x¯=x,y,z,w). 25% Note that in SEQUNIT and PARUNIT all semi-joins are Input 111 025257050505%%%%%% 100% 104% 52% 100% 108% 61% 100% 100% 64% 100% 79% 50% tefohvuaanltudaartteehdiadtienGntsriecepaealrdtayot-etShjGeobFosp.ytFiimeoldraslatlmloputeoltslitowsgaiccyoanltdosupocrottloe(dgciochmaelrpesu,otrwetdes mmunication 11 02570505%%%% 100% 105% 69% 100% 105% 79% 100% 95% 85% 100% 76% 72% ttrhoeSuiomgphitliabmrruattolepo-fluoarrncose.bsmerevtahtoidons)s;fhoernBcSe,GwFeqoumeriitetsh,ewerefisnudltsthfoart Co 25% fullsequentialevaluation(SEQUNIT)resultsinthelargest C1 C2 C3 C4 nettimes. Indeed,PARUNITexhibits55%lowernettimes on average. We also observe that PARUNIT exhibits sig- nificantly larger total times than SEQUNIT for queries C1 Figure 5: SGF results, values relative to SEQUNIT. and C2, while this is not the case for C3 and C4. The rea- son is that for C3 and C4, queries on the same level still share common characteristics, leading to a lower number of SEQ are obtained. Finally, Gumbo outperforms Pig and distinct semi-joins. Hive in all aspects when it comes to parallel evaluation of ForGreedy-SGF,wefindthatitexhibitsnettimesthat BSGF queries. are,onaverage,42%lowerthanSEQUNIT,whilestillbeing 29%higherthanPARUNIT.Themainreasonforthisisthe 5.3 SGFQueries fact that Greedy-SGF aims to minimize total time, and Inthissection,weshowthatthealgorithmGreedy-SGF may introduce extra levels in the MR query plan to obtain succeeds in lowering total time while avoiding significant this goal. Indeed, we find that total times are down 27% increaseinnettime. Figure6givesanoverviewofthetypeof w.r.t. SEQUNIT, and 29% w.r.t. PARUNIT. queriesthatareused. ResultsaredepictedinFigure5. Note Finally, we note that the absolute savings in net time thatthesequeriesallexhibitdifferentproperties. QueriesC1 range from 115s to 737s for these queries, far outweighing andC2bothcontainasetofSGFquerieswhereanumberof theoverheadcostofcalculatingthequeryplanitself,which atoms overlap. Query C3 is a complex query that contains typically takes around 10s (sampling included). Hence, we a multitude of different atoms. Finally, Query C4 consists concludethatGreedy-SGFprovidesanevaluationstrategy of two levels and many overlapping atoms. for SGF queries that manages to bring down the total time WeconsiderthefollowingevaluationstrategiesinGumbo: (andhence,theresourcecost)ofparallelqueryplans,while (i) sequentially, i.e., one at a time, evaluating all BSGF still exhibiting low net times when compared to sequential queries in a bottom-up fashion (SEQUNIT); (ii) evaluat- approaches. ing all BSGF queries in a bottom-up fashion level by level 5.4 SystemCharacteristics where queries on the same level are executed in parallel (PARUNIT); and, (iii) using the greedily computed topo- In this final experiment, we study the effect of growing logicalsortcombinedwithGreedy-BSGF(Greedy-SGF); datasize,clustersize,querysize,andselectivity. Wechoose 741

Description:
While services such as Amazon AWS make computing power abundantly available through their role in semi-join reducers [9,10], facilitating . semi-join algebra [26]. parallel evaluation of queries A1–A5 to highlight the major.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.