ebook img

Approximation and Search Optimization on Massive Data Bases and Data Streams PDF

290 Pages·2014·3.82 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Approximation and Search Optimization on Massive Data Bases and Data Streams

UCLA UCLA Electronic Theses and Dissertations Title Approximation and Search Optimization on Massive Data Bases and Data Streams Permalink https://escholarship.org/uc/item/4pv2n0vs Author ZENG, KAI Publication Date 2014 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA LosAngeles Approximation and Search Optimization on Massive Data Bases and Data Streams Adissertationsubmittedinpartialsatisfaction oftherequirementsforthedegree DoctorofPhilosophyinComputerScience by Kai Zeng 2014 (cid:13)c Copyrightby KaiZeng 2014 ThedissertationofKaiZengisapproved. JunghooCho TysonCondie YingnianWu CarloZaniolo,CommitteeChair UniversityofCalifornia,LosAngeles 2014 ii Tomyparentsandmywife... fortheirunconditionallove iii TABLE OF CONTENTS 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 SearchChallenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 AnalyticsChallenges . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 OverviewandContributions . . . . . . . . . . . . . . . . . . . . . . 7 1.3.1 K*SQLandXSeq: ExpressiveandEfficientCEPLanguages . 7 1.3.2 Trinity.RDF:WebScaleGraphEngine . . . . . . . . . . . . . 7 1.3.3 EARLandABM:Bootstrap-BasedApproximationTechniques forInteractiveDataAnalytics . . . . . . . . . . . . . . . . . 8 I SEARCH OPTIMIZATION 12 2 Background: NestedWordsandVisiblyPushdownAutomata . . . . . . 13 2.1 NestedWords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 VisiblyPushdownAutomata . . . . . . . . . . . . . . . . . . . . . . 15 2.3 DifferencebetweenNestedWordsandVPAs . . . . . . . . . . . . . . 17 3 KSQL:UnifyingLanguagesandQueryExecutionforRelationalandXML Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1 K*SQLByExamples . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.1.1 NestedKstars . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.1.2 Linear-HierarchicalData . . . . . . . . . . . . . . . . . . . . 26 3.2 ExpressivePower . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 iv 3.2.1 K*SQLvs. XPath . . . . . . . . . . . . . . . . . . . . . . . 32 3.2.2 K*SQLvs. OtherSequenceLanguages . . . . . . . . . . . . 33 3.2.3 MonadicSecondOrderLogic . . . . . . . . . . . . . . . . . 35 3.3 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.3.1 Compile-timeOptimization . . . . . . . . . . . . . . . . . . 36 3.3.2 OptimizationforNestedConstructs . . . . . . . . . . . . . . 38 3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.4.1 XMLqueriesinK*SQL . . . . . . . . . . . . . . . . . . . . 41 3.4.2 QueryExecutionTime . . . . . . . . . . . . . . . . . . . . . 42 3.4.3 NumberofBacktracks . . . . . . . . . . . . . . . . . . . . . 43 3.5 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.6 SummaryofK*SQL . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.7 K*SQLSyntaxandExpressivePower . . . . . . . . . . . . . . . . . 45 3.7.1 K*SQLSyntax . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.7.2 K*SQLforOtherDomains . . . . . . . . . . . . . . . . . . . 46 3.7.3 ProofofTheorem1(Algorithm) . . . . . . . . . . . . . . . . 47 3.7.4 XPathforSequenceQueries . . . . . . . . . . . . . . . . . . 52 3.7.5 FromVPEtoK*SQL . . . . . . . . . . . . . . . . . . . . . . 53 3.7.6 AggregatesandComplexity . . . . . . . . . . . . . . . . . . 57 4 XSeq: High-Performance Complex Event Processing over Hierarchical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.1 XSeqQueryLanguage . . . . . . . . . . . . . . . . . . . . . . . . . 63 v 4.2 AdvancedQueriesfromComplexEventProcessing . . . . . . . . . . 72 4.2.1 StockAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.2.2 SocialNetworks . . . . . . . . . . . . . . . . . . . . . . . . 74 4.2.3 InventoryManagement . . . . . . . . . . . . . . . . . . . . . 75 4.2.4 DirectorySearch . . . . . . . . . . . . . . . . . . . . . . . . 76 4.2.5 Genetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.2.6 Protein,RNAandDNADatabases . . . . . . . . . . . . . . . 78 4.2.7 TemporalQueries . . . . . . . . . . . . . . . . . . . . . . . . 79 4.2.8 SoftwareTraceAnalysis . . . . . . . . . . . . . . . . . . . . 81 4.3 XSeqOptimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.3.1 EfficientQueryPlansviaVPA . . . . . . . . . . . . . . . . . 83 4.3.2 StaticVPAOptimization . . . . . . . . . . . . . . . . . . . . 88 4.3.3 Run-timeVPAOptimization . . . . . . . . . . . . . . . . . . 90 4.4 FormalSemanticsofXSeq . . . . . . . . . . . . . . . . . . . . . . . 91 4.5 ExpressivenessandComplexity . . . . . . . . . . . . . . . . . . . . 96 4.5.1 CXSeq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.5.2 RegularityofCXSeqandComplexity . . . . . . . . . . . . . 101 4.5.3 QueryEvaluationComplexity . . . . . . . . . . . . . . . . . 101 4.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.6.1 EffectivenessofDifferentOptimizations . . . . . . . . . . . . 103 4.6.2 SequenceQueriesvs. XPathEngines . . . . . . . . . . . . . 105 4.6.3 ConventionalQueriesvs. XPathEngines . . . . . . . . . . . 107 4.6.4 ThroughputforDifferentTypesofQueries . . . . . . . . . . 109 vi 4.7 PreviousWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.8 SummaryofXSeq . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.9 CoreXSeqProofofRegularity . . . . . . . . . . . . . . . . . . . . . 113 4.9.1 CoreXSeqwithVariableConcatenation . . . . . . . . . . . . 114 4.9.2 CoreXSeqBasic . . . . . . . . . . . . . . . . . . . . . . . . 119 5 Trinity.RDF:ADistributedGraphEngineforWebScaleGraphs . . . . 124 5.1 Joinvs. GraphExploration . . . . . . . . . . . . . . . . . . . . . . . 128 5.1.1 RDFandSPARQL . . . . . . . . . . . . . . . . . . . . . . . 128 5.1.2 UsingJoinOperations . . . . . . . . . . . . . . . . . . . . . 129 5.1.3 UsingGraphExplorations . . . . . . . . . . . . . . . . . . . 130 5.2 SystemArchitecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.3 DataModeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 5.3.1 ModelingGraphs . . . . . . . . . . . . . . . . . . . . . . . . 134 5.3.2 GraphPartitioning . . . . . . . . . . . . . . . . . . . . . . . 135 5.3.3 IndexingPredicates . . . . . . . . . . . . . . . . . . . . . . . 137 5.3.4 BasicGraphOperators . . . . . . . . . . . . . . . . . . . . . 138 5.4 QueryProcessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.4.2 SingleTriplePatternMatching . . . . . . . . . . . . . . . . . 140 5.4.3 MultiplePatternMatchingbyExploration . . . . . . . . . . . 143 5.4.4 FinalJoinafterExploration . . . . . . . . . . . . . . . . . . 145 5.4.5 ExplorationPlanOptimization . . . . . . . . . . . . . . . . . 146 vii 5.4.6 CostEstimation . . . . . . . . . . . . . . . . . . . . . . . . . 151 5.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 5.6 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 5.7 SummaryofTrinity.RDF . . . . . . . . . . . . . . . . . . . . . . . . 165 II APPROXIMATION OPTIMIZATION 166 6 EARL:EarlyAccurateResultsforAdvancedAnalyticsonMapReduce 167 6.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 6.1.1 ExtendingMapReduce . . . . . . . . . . . . . . . . . . . . . 171 6.2 EstimatingAccuracy . . . . . . . . . . . . . . . . . . . . . . . . . . 175 6.2.1 AccuracyEstimationStage . . . . . . . . . . . . . . . . . . . 176 6.2.2 SampleSizeandNumberofBootstraps . . . . . . . . . . . . 177 6.2.3 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 6.2.4 FaultTolerance . . . . . . . . . . . . . . . . . . . . . . . . . 183 6.3 Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 6.3.1 Inter-IterationOptimization . . . . . . . . . . . . . . . . . . 184 6.3.2 Intra-IterationOptimization . . . . . . . . . . . . . . . . . . 187 6.4 CurrentImplementation . . . . . . . . . . . . . . . . . . . . . . . . . 188 6.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 6.5.1 AStrongCaseforEARL . . . . . . . . . . . . . . . . . . . . 191 6.5.2 ApproximateMedianComputation . . . . . . . . . . . . . . 192 6.5.3 EARLandAdvancedMiningAlgorithms . . . . . . . . . . . 193 viii

Description:
XPath Engines . 105. 4.6.3 Conventional Queries vs. XPath Engines . 107. 4.6.4 Throughput for Different Types of Queries 2.1 Tiny examples of nested words in different domains: XML and ge- . 6.4 An example of how a user job would work with the EARL framework 189.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.