ebook img

High Performance Multidimensional Analysis and Data Mining PDF

19 Pages·1998·0.288 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview High Performance Multidimensional Analysis and Data Mining

High Performance Multidimensional Analysis and Data (cid:1) Mining Sanjay Goil and Alok Choudhary Center for Parallel and Distributed Computing(cid:1) Department of Electrical (cid:2) Computer Engineering(cid:1) Northwestern University(cid:1) Technological Institute(cid:1) (cid:3)(cid:4)(cid:5)(cid:6) Sheridan Road(cid:1) Evanston(cid:1) IL(cid:7)(cid:8)(cid:9)(cid:3)(cid:9)(cid:10) fsgoil(cid:1)choudharg(cid:2)ece(cid:3)nwu(cid:3)edu Abstract Summary information from data in large databases is used to answer queries in On(cid:1) Line Analytical Processing (cid:2)OLAP(cid:3) systems and to build decision support systems over them(cid:4) The Data Cube is used to calculate and store summary information on a variety of dimensions(cid:5) which is computed only partially if the number of dimensions is large(cid:4) Queries posed on such systems are quite complex and require di(cid:6)erent views of data(cid:4) These may either be answered from a materialized cube in the data cube or calculated on the (cid:7)y(cid:4) Further(cid:5) data mining for associations can be performed on the data cube(cid:4) Analytical models need to capture the multidimensionality of the underlying data(cid:5) a task for which multidimensional databases are well suited(cid:4) Also(cid:5) they are amenable to parallelism(cid:5) which is necessary to deal with large (cid:2)and still growing(cid:3) data sets(cid:4) Mul(cid:1) tidimensional databases store data in multidimensional structure on which analytical operations are performed(cid:4) A challenge for these systems is how to handle large data (cid:1) ThisworkwassupportedinpartbyNSFYoungInvestigatorAwardCCR(cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)(cid:7)(cid:8)andNSFCCR(cid:1)(cid:2)(cid:4)(cid:8)(cid:2)(cid:9)(cid:7)(cid:3)(cid:10) (cid:1) sets in a large number of dimensions(cid:4) These techniques are also applicable to scienti(cid:8)c and statistical databases (cid:2)SSDB(cid:3) which employ large multidimensional databases and dimensional operations over them(cid:4) In this paper we present (cid:1)(cid:2)(cid:3) A parallel infrastructure for OLAP multidimensional databases integrated with association rule mining(cid:4) (cid:1)(cid:4)(cid:3) Introduce Bit(cid:1)Encoded Sparse Structure (cid:2)BESS(cid:3) for sparse data storage in chunks(cid:4) (cid:1)(cid:5)(cid:3) Scheduling optimizations for parallel computation of complete and partial data cubes(cid:4) (cid:1)(cid:6)(cid:3) Implementation of a large scale multidimensionaldatabase engine suitable for dimensional analysis used in OLAP andSSDBfor(cid:2)a(cid:3)largenumberofdimensions(cid:2)(cid:9)(cid:10)(cid:1)(cid:11)(cid:10)(cid:3)(cid:2)b(cid:3)largedatasets(cid:2)(cid:12)(cid:10)sofGigabyte(cid:3) Our implementation on the IBM SP(cid:1)(cid:9) can handle large data sets and a large num(cid:1) ber of dimensions(cid:5) using disk I(cid:13)O(cid:4) Results are presented showing its performance and scalability(cid:4) (cid:1) Introduction Decision support systems use On(cid:2)Line Analytical processing (cid:3)OLAP(cid:4) and data mining to (cid:5)nd interesting information from large databases(cid:6) Multidimensional databases are suitable for OLAP and data mining since these applications require dimension oriented operations on data(cid:6) Traditional multidimensionaldatabases store data in multidimensional arrays on which analyticaloperationsareperformed(cid:6) Multidimensionalarraysaregoodtostoredensedata(cid:7)but most datasets are sparse in practice for which other e(cid:8)cient storage schemes are required(cid:6) For largedatabases(cid:7)reducedstoragerequirementscanreducetheI(cid:9)Otimeneededfordataaccesses during operations when appropriate data structures are used(cid:6) Each cell in a multidimensional array can be accessed by an index in each of its dimension(cid:6) Hence(cid:7) (cid:5)nding an element in it is a trivial operation(cid:6) Sparse data structures can provide e(cid:8)cient storage mechanisms(cid:7) but dimension oriented operations can be more expensive when compared to multidimensional arrays(cid:7) because of the additional processing required to de(cid:2)reference indices(cid:6) It is important toweigh the trade(cid:2)o(cid:10)sinvolvedin reducingthestoragespace versus theincrease inaccess time (cid:11) for each sparse data structure(cid:7) in comparison to multidimensional arrays(cid:6) These trade(cid:2)o(cid:10)s are dependent on many parameters some of which are (cid:3)(cid:1)(cid:4) number of dimensions(cid:7) (cid:3)(cid:11)(cid:4) sizes of dimensions and (cid:3)(cid:12)(cid:4) degree of sparsity of the data(cid:6) Complex operations such as those required for data(cid:2)driven data mining can be very expensive in terms of data access time if e(cid:8)cient data structures are not used(cid:6) We compare the storage and operational e(cid:8)ciency in OLAP and data mining of various sparse data storage schemes in (cid:13)GC(cid:14)(cid:15)b(cid:16)(cid:6) A novel data structure using bit encodings for dimension indices called Bit(cid:2)Encoded Sparse Structure (cid:3)BESS(cid:4) is used to store data in chunks(cid:7) which supports fast OLAP query operations on sparse data using bit operations without the need for exploding the sparse data into a multidimensionalarray(cid:6) This allows for high dimensionalityand large dimension sizes(cid:6) Our techniques for multidimensional analysis can be applied to scienti(cid:5)c and statistical databases (cid:13)LS(cid:14)(cid:17)(cid:16) which rely heavily on dimensional analysis(cid:6) In this paper we present a parallel multidimensional framework for large data sets in OLAP and SSDB which has been integrated with data mining of association rules(cid:6) Parallel data cube construction for large data sets and a large number of dimensions using sparse storage structures is presented(cid:6) Sparsity is handled by using compressed chunks using BESS(cid:6) Three schedules for constructing complete and partial data cubes from the cube lattice are developed(cid:7) using cost estimates between parent and child aggregates(cid:18) (cid:3)(cid:1)(cid:4) Pipelined method for cubes (cid:3)(cid:11)(cid:4) Pipelined method for chunks and (cid:3)(cid:12)(cid:4) Level by level method(cid:6) These incorporate various optimizations to reduce computation(cid:7) communication and I(cid:9)O overheads(cid:6) Chunk management strategies are described to store the various intermediate cubes and handle large data sets(cid:6) The rest of the paper is organized as follows(cid:6) Section (cid:11) describes chunk storage using BESS and the multidimensional infrastructure(cid:6) Section (cid:12) describes mining of association rules on the data cube(cid:6) Section (cid:19) describes strategies and optimizations for parallel aggregate computation(cid:6) Section (cid:20) presents results of our implementation on the IBM SP(cid:11)(cid:6) Section (cid:21) concludes the paper(cid:6) (cid:12) (cid:2) Multidimensional Data Storage and BESS Multidimensional database technology facilitates (cid:22)exible(cid:7) high performance access and analy(cid:2) sis of large volumes of complex and interrelated data (cid:13)Sof(cid:14)(cid:15)(cid:16)(cid:6) It is more natural and intuitive for humans to model a multidimensional structure(cid:6) A chunk is de(cid:5)ned as a block of data from the multidimensionalarray which contains data in alldimensions(cid:6) A collectionof chunks de(cid:5)nes the entire array(cid:6) Figure (cid:1)(cid:3)a(cid:4) shows chunking of a three dimensional array(cid:6) A chunk is stored contiguously in memory and data in each dimension is strided with the dimension sizes of the chunk(cid:6) Most sparse data may not be uniformly sparse(cid:6) Dense clusters of data can be stored as multidimensional arrays(cid:6) Sparse data structures are needed to store the sparse portions of data(cid:6) These chunks can then either be stored as dense arrays or stored using an appropriate sparse data structure as illustrated in Figure (cid:1)(cid:3)b(cid:4)(cid:6) Chunks also act as an index structure which helps in extracting data for queries and OLAP operations(cid:6) d2 c29c30c31 c32 C1C2C3 c13c14c15c16 c28 B4 c24 c9 c10c11c12 B3 d1 c5 c6 c7 c8 c20 B2 c1 c2 c3 c4 B1 A1 A2 A3 A4 AB Multidimd0ensional array Chunks VaClue or Offset c1 c2 c3 . . . c32 Value or Storage in memory Dense Array Sparse Data Structures (cid:3)a(cid:4) Chunking for a (cid:12)D array (cid:3)b(cid:4) Chunked storage for the cube Figure (cid:1)(cid:18) Storage of data in chunks Typically(cid:7) sparsestructures have been usedforadvantagestheyprovideintermsofstorage(cid:7) but operations on data are performed on a multidimensional array which is populated from the sparse data(cid:6) However(cid:7) this is not always possible when either the dimension sizes are large or the number of dimensions is large(cid:6) Since we are dealing with multidimensional structures for a large number of dimensions(cid:7) we are interested in performing operations on the sparse structure itself(cid:6) This is desirable to reduce I(cid:9)O costs by having more data in memory to work on(cid:6) This is one of the primary motivations for our Bit(cid:2)encoded sparse storage (cid:3)BESS(cid:4)(cid:6) For each cell present in a chunk a dimension index is encoded in dlogjdije bits for each dimension (cid:19) di of size jdij(cid:6) A (cid:17)(cid:2)byte encoding is used to store the BESS index along with the value at that location(cid:6) A larger encoding can be used if the number of dimensions are larger than (cid:11)(cid:23)(cid:6) A dimension index can then be extracted by a bit mask operation(cid:6) Aggregation along a dimension di can be done by masking its dimension encoding in BESS and using a sort operation to get the duplicate resultant BESS values together(cid:6) This is followed by a scan of the BESS index(cid:7) aggregating values for each duplicate BESS index(cid:6) For dimensional analysis(cid:7) aggregation needs to be done for appropriate chunks along a dimensional plane(cid:6) (cid:3) OLAP and Data Mining OLAP isused tosummarize(cid:7) consolidate(cid:7) view(cid:7) applyformulaeto(cid:7) andsynthesize dataaccord(cid:2) ing to multiple dimensions(cid:6) Traditionally(cid:7) a relational approach (cid:3)relational OLAP(cid:4) has been taken to build such systems(cid:6) Relational databases are used to build and query these systems(cid:6) A complex analytical query is cumbersome to express in SQL and it might not be e(cid:8)cient to execute(cid:6) Alternatively(cid:7) multi(cid:2)dimensional database techniques (cid:3)multi(cid:2)dimensional OLAP(cid:4) have been applied to decision(cid:2)support applications(cid:6) Data is stored in multi(cid:2)dimensionalstruc(cid:2) tures which is a more natural way to express the multi(cid:2)dimensionality of the enterprise data andismoresuitedforanalysis(cid:6) A(cid:24)cell(cid:25)inmulti(cid:2)dimensionalspacerepresents atuple(cid:7)withthe attributes of the tuple identifying the locationof the tuple in the multi(cid:2)dimensionalspace and the measure values represent the content of the cell(cid:6) Various sparse storage techniques have been applied to deal with sparse data in earlier methods(cid:6) We use bit(cid:2)encoded sparse structure (cid:3)BESS(cid:4) is used for compressing chunks since it is suited for fast dimensional operations(cid:6) We have evaluated the nature of query operations in OLAP and SSDB in (cid:13)GC(cid:14)(cid:15)b(cid:16)(cid:7) which are dimension oriented classi(cid:5)ed as(cid:18) (cid:2) Retrieval of a random cell element (cid:2) Retrieval along a dimension or a combination of dimensions (cid:20) (cid:2) Retrieval for values of dimensions within a range (cid:1)Range Queries(cid:2) (cid:2) Aggregation operations on dimensions (cid:2) Multi(cid:3)dimensionalAggregation(cid:1)Generalization(cid:4)Consolidation(cid:5) lowertohigher level in hierarchy(cid:2) In (cid:13)GC(cid:14)(cid:15)b(cid:16)(cid:7) an analysis is provided for each of these query operations using various sparse data structures and it is shown that BESS is superior to others(cid:6) DatacanbeorganizedintoadatacubebycalculatingallpossiblecombinationsofGROUP(cid:2) BYs(cid:13)GBLP(cid:14)(cid:21)(cid:16)(cid:6) This operationisusefulforanswering OLAPquerieswhich use aggregationon k di(cid:10)erent combinations of attributes(cid:6) For a data set with k attributes this leads to (cid:11) GROUP(cid:2) BY calculations(cid:6) A data cube treats each of the k aggregation attributes as a dimension in k(cid:2)space(cid:6) An aggregate of a particular set of attribute values is a point in this space(cid:6) The set of points form a k(cid:2)dimensional cube(cid:6) Data Cube operators generalize the histogram(cid:7) cross(cid:2)tabulation(cid:7) roll(cid:2)up(cid:7) drill(cid:2)down and sub(cid:2)total constructs required by (cid:5)nancial databases(cid:6) (cid:1)(cid:2)(cid:3) Attribute(cid:4)Oriented Mining On Data Cubes Data mining techniques are used to discover patterns and relationships from data which can enhance our understanding of the underlying domain(cid:6) Discovery of quantitative rules is associated with quantitative information from the database(cid:6) The data cube represents quantitative summary information for a subset of the attributes(cid:6) Data mining techniques like Associations(cid:1) Classi(cid:2)cation(cid:1) Clustering and Trend analysis (cid:13)FPSSU(cid:14)(cid:19)(cid:16) can be used together with OLAP to discover knowledge from data(cid:6) Attribute(cid:2)oriented approaches (cid:13)Bha(cid:14)(cid:20)(cid:16) (cid:13)HCC(cid:14)(cid:12)(cid:16) (cid:13)HF(cid:14)(cid:20)(cid:16) to data mining are data(cid:2)driven and can use this summary information to discover association rules(cid:6) Transaction data can be used to mine association rules by associating support and con(cid:2)dence for each rule(cid:6) Support of a pattern A in a set S is the ratio of the number of transactions containing A and the total number of transactions in S(cid:6) Con(cid:5)dence (cid:21) of a rule A (cid:3) B is the probability that pattern B occurs in S when pattern A occurs in S and can be de(cid:5)ned as the ratioof the support of AB and support of A(cid:7) (cid:3)P(cid:3)BjA(cid:4)(cid:4)(cid:6) The rule is then described as A (cid:3) B (cid:13)support(cid:1) con(cid:2)dence(cid:16) and a strong association rule has a support greater than a pre(cid:2)determined minimum support and a con(cid:5)dence greater than a pre(cid:2)determined minimum con(cid:5)dence(cid:6) This can also be taken as the measure of (cid:24)interestingness(cid:25) of the rule(cid:6) Calculation of support and con(cid:5)dence for the rule A (cid:3) B involve the aggregates from the cube AB(cid:7) A and ALL(cid:6) Additionally(cid:7) dimension hierarchies can be utilized to provide multiple level data mining by progressive generalization (cid:3)roll(cid:2)up(cid:4) and deepening (cid:3)drill(cid:2)down(cid:4)(cid:6) This is useful for data mining at multiple concept levels and interesting information can potentially be obtained at di(cid:10)erent levels(cid:6) An event(cid:7) E is a string En (cid:26) x(cid:1)(cid:1)x(cid:2)(cid:1)x(cid:3)(cid:1)(cid:2)(cid:2)(cid:2)xn(cid:27) in which for some j(cid:1)k (cid:4) (cid:13)(cid:1)(cid:1)n(cid:16)(cid:7) xj is a possible value for some attribute and xk is a value for a di(cid:10)erent attribute of the underlying data(cid:6) Whether or not E is interesting to that xj(cid:28)s occurrence depends on the other xk(cid:28)s occurrence(cid:6) The (cid:24)interestingness(cid:25) measure is the size Ij(cid:3)E(cid:4) of the di(cid:10)erence between(cid:18) (cid:3)a(cid:4) the probability of E among all such events in the data set and(cid:7) (cid:3)b(cid:4) the probability that x(cid:1)(cid:1)x(cid:2)(cid:1)x(cid:3)(cid:1)(cid:2)(cid:2)(cid:2)xj(cid:1)(cid:1)(cid:1)xj(cid:4)(cid:1)(cid:1)(cid:2)(cid:2)(cid:2)xn and xj occurred independently(cid:6) The condition of interestingness can then be de(cid:5)ned as Ij(cid:3)E(cid:4) (cid:3) (cid:4)(cid:7) where (cid:4) is some (cid:5)xed threshold(cid:6) Consider a (cid:12) attribute data cube with attributes A(cid:7) B and C(cid:7) de(cid:5)ning E(cid:3) (cid:26) ABC(cid:6) For showing (cid:11)(cid:2)way associations(cid:7) interestingness function between A and B(cid:7) A and C and (cid:5)nally between B and C are calculated(cid:6) When calculating associations between A and B(cid:7) the prob(cid:2) ability of E(cid:7) denoted by P(cid:3)AB(cid:4) is the ratio of the aggregation values in the sub(cid:2)cube AB and ALL(cid:6) Similarly the independent probability of A(cid:7) P(cid:3)A(cid:4) is obtained from the values in the sub(cid:2)cube A(cid:7) dividing them by ALL(cid:6) P(cid:3)B(cid:4) is similarly calculated from B(cid:6) The calculation jP(cid:3)AB(cid:4) (cid:5) P(cid:3)A(cid:4)P(cid:3)B(cid:4)j (cid:3) (cid:4)(cid:7) for some threshold (cid:4)(cid:7) is performed in parallel(cid:6) Since the cubes AB and A are distributed along the A dimension no replication of A is needed(cid:6) However(cid:7) since B is distributed in sub(cid:2)cube B(cid:7) and B is local on each processor in AB(cid:7) B needs to be replicated on all processors(cid:6) Similarly(cid:7) an appropriate placement of values is required for a two dimensional data distribution(cid:6) (cid:15) (cid:4) Implementation and Optimizations (cid:5)(cid:2)(cid:3) Data Partitioning Data is distributed on processors to distribute work equitably(cid:6) In addition(cid:7) a partitioning scheme for multidimensional has to be dimension(cid:3)aware and for dimension(cid:2)oriented opera(cid:2) tions have some regularity in the distribution(cid:6) A dimension(cid:7) or a combination of dimensions can be distributed(cid:6) In order to achieve su(cid:8)cient parallelism(cid:7) it would be required that the product of cardinalities of the distributed dimensions be much larger than the number of processors(cid:6) For example(cid:7) for (cid:20) dimensional data (cid:3)ABCDE(cid:4)(cid:7) a (cid:1)D distribution will partition A and a (cid:11)D distribution will partition AB(cid:6) We assume(cid:7) that dimensions are available that have cardinalities much greater than the number of processors in both cases(cid:6) That is(cid:7) either jAij (cid:6) p for some i(cid:7) or jAijjAjj (cid:6) p for some i(cid:1)j(cid:7) (cid:1) (cid:7) i(cid:1)j (cid:7) n(cid:7) n is the number of dimensions(cid:6) Partitioning determines the communication requirements for data movement in the interme(cid:2) diate aggregate calculations in the data cube(cid:6) Figure (cid:11) illustrates (cid:1)D and (cid:11)D partitions on a (cid:12)(cid:2)dimensional data set on (cid:19) processors(cid:6) C C P1 P3 P0 P1 P2 P3 B B P0 P2 A A Figure (cid:11)(cid:18) (cid:1)D and (cid:11)D partition for (cid:12) dimensions on (cid:19) processors (cid:5)(cid:2)(cid:6) Partial cubes Attribute focusing techniques for m(cid:2)way association rules and meta(cid:2)rule guided mining for association rules with m predicates in the rule require that all aggregates with m dimensions be materialized(cid:6) These are called essential aggregates(cid:6) Since data mining calculations are performed on the cubes with at most m dimensions(cid:7) the complete data cube is not needed(cid:6) Only cubes up to level m in the data cube lattice are needed(cid:6) To calculate this partial cube (cid:17) e(cid:8)ciently(cid:7) a schedule is required to minimize the intermediate calculations(cid:6) Since the number n of aggregates is a fraction of the total number (cid:3)(cid:11) (cid:4) in a complete cube(cid:7) a di(cid:10)erent schedule(cid:7) withadditionaloptimizationstotheones discussed earlier(cid:7) isneeded tominimizeandcalculate the intermediate(cid:7) non(cid:3)essential aggregate calculations(cid:7) that will help materialize the essential aggregates(cid:6) Also(cid:7) for a large number of dimensions it is impractical to calculate the complete data cube for OLAP(cid:6) A subset of the aggregates can be materialized in our framework(cid:7) and queries can be answered from the existing materializations(cid:7) or some others can be calculated on the (cid:22)y(cid:6) Data cube reductions for OLAP have been topics of considerable research(cid:6) (cid:13)HRU(cid:14)(cid:21)(cid:16) has presented a greedy strategy to materialize the cubes depending on their bene(cid:5)t(cid:6) Consider a n dimensional data set and the cube lattice discussed earlier and a level by level construction of the partial data cube(cid:6) The set of intermediate cubes between levels n and m are essential if they are capable of calculating the next level of essential cubes by aggregating a single dimension(cid:6) This provides the minimum set at a level k (cid:29) (cid:1) which can cover the calculations at level k(cid:6) To obtain the essential aggregates(cid:7) we traverse the lattice (cid:3)Figure (cid:12)(cid:3)a(cid:4)(cid:4)(cid:7) level by level starting from level (cid:23)(cid:6) For a level k(cid:7) each entry has to be matched with the an entry from level k (cid:29) (cid:1)(cid:7) such that the cost of calculation is the lowest and the minimumnumber of non(cid:2)essential aggregates are calculated(cid:6) We have to determine the lowest number of non(cid:2)essential intermediate aggregates to be materialized(cid:6) This can be done in one of two ways(cid:6) For each entry in level k we choose an arc to the leftmost entry in level k (cid:29)(cid:1)(cid:7) which it can be calculated from(cid:6) This is referred to as the left schedule shown in Figure (cid:12)(cid:3)b(cid:4)(cid:6) For example(cid:7) ABC has arcs to ABCD and ABCE(cid:7) and the left schedule will choose ABCD to calculate ABC(cid:6) Similarly(cid:7) a right schedule can be calculated by considering the arcs from the right side of the lattice (cid:3)Figure (cid:12)(cid:3)c(cid:4)(cid:4)(cid:6) Further optimizations on the left schedule can be performed by replacing ABC (cid:3) AC by ACE (cid:3) AC(cid:7) since ACE is also being materialized and the latter involves a local calculation(cid:7) if (cid:11)D partitioning is considered(cid:6) Of course(cid:7) there are di(cid:10)erent trade(cid:2)o(cid:10)s involved in choosing one over the other(cid:7) in terms of sizes of the cubes(cid:7) partitioning of data on processors and the inter(cid:2)processor communication involved(cid:6) (cid:14) Level ALL ALL ALL 0 A B C D E A B C D E A B C D E 1 AB AC AD AE BC BD BE CD CE DE 2 AB AC AD AE BC BD BE CD CE DE AB AC AD AE BC BD BE CD CE DE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE 3 ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE 4 ABCD ABCE ABDE ACDE BCDE ABCD ABCE ABDE ACDE BCDE ABCDE 5 ABCDE ABCDE (cid:3)a(cid:4) (cid:3)b(cid:4) (cid:3)c(cid:4) Figure (cid:12)(cid:18) (cid:3)a(cid:4) Lattice for cube operator (cid:3)b(cid:4) DAG for partial data cube (cid:3)left schedule(cid:4) (cid:3)c(cid:4) right schedule (cid:5)(cid:2)(cid:1) Number of aggregates to materialize for mining (cid:6)(cid:4)way associ(cid:4) ations In a full data cube(cid:7) the number of aggregates is exponential(cid:6) In the cube lattice there are n (cid:1) n (cid:3) levels and for each level i there are aggregates at each level containing i dimensions(cid:6) B i C (cid:2) A n The total number of aggregates(cid:7) thus is (cid:11) (cid:6) For a large number of dimensions(cid:7) this leads to a prohibitively large number of cubes to materialize(cid:6) Next(cid:7) we need to determine the number of intermediate aggregates needed to calculate all the aggregates for all levels below a speci(cid:5)ed m(cid:1)m (cid:5) n(cid:7) for m(cid:2)way associations and for meta(cid:2)rules with m predicates(cid:6) n(cid:1)(cid:1)(cid:1) j (cid:3) n(cid:1)(cid:2)(cid:1) j (cid:3) Form (cid:26) (cid:11)(cid:7) thenumberofaggregatesatlevel(cid:11)is j(cid:5)(cid:1) (cid:6) Forlevel(cid:12)itis j(cid:5)(cid:1) (cid:6) P B C P B C (cid:2) (cid:1) A (cid:2) (cid:1) A At each level k(cid:7) the (cid:5)rst k (cid:5) (cid:1) literals in the representation are (cid:5)xed and the others are n(cid:1)(cid:1) i (cid:1) j (cid:3) chosen from the remaining(cid:6) The total number of aggregates is then i(cid:5)(cid:1) j(cid:5)(cid:1) (cid:6) After P P B C (cid:2) (cid:1) A (cid:1)(cid:23)

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.