ebook img

UCLA Electronic Theses and Dissertations PDF

290 Pages·2014·3.82 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview UCLA Electronic Theses and Dissertations

UCLA UCLA Electronic Theses and Dissertations Title Approximation and Search Optimization on Massive Data Bases and Data Streams Permalink https://escholarship.org/uc/item/4pv2n0vs Author ZENG, KAI Publication Date 2014 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA LosAngeles Approximation and Search Optimization on Massive Data Bases and Data Streams Adissertationsubmittedinpartialsatisfaction oftherequirementsforthedegree DoctorofPhilosophyinComputerScience by Kai Zeng 2014 (cid:13)c Copyrightby KaiZeng 2014 ThedissertationofKaiZengisapproved. JunghooCho TysonCondie YingnianWu CarloZaniolo,CommitteeChair UniversityofCalifornia,LosAngeles 2014 ii Tomyparentsandmywife... fortheirunconditionallove iii TABLE OF CONTENTS 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 SearchChallenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 AnalyticsChallenges . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 OverviewandContributions . . . . . . . . . . . . . . . . . . . . . . 7 1.3.1 K*SQLandXSeq: ExpressiveandEfficientCEPLanguages . 7 1.3.2 Trinity.RDF:WebScaleGraphEngine . . . . . . . . . . . . . 7 1.3.3 EARLandABM:Bootstrap-BasedApproximationTechniques forInteractiveDataAnalytics . . . . . . . . . . . . . . . . . 8 I SEARCH OPTIMIZATION 12 2 Background: NestedWordsandVisiblyPushdownAutomata . . . . . . 13 2.1 NestedWords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 VisiblyPushdownAutomata . . . . . . . . . . . . . . . . . . . . . . 15 2.3 DifferencebetweenNestedWordsandVPAs . . . . . . . . . . . . . . 17 3 KSQL:UnifyingLanguagesandQueryExecutionforRelationalandXML Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1 K*SQLByExamples . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.1.1 NestedKstars . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.1.2 Linear-HierarchicalData . . . . . . . . . . . . . . . . . . . . 26 3.2 ExpressivePower . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 iv 3.2.1 K*SQLvs. XPath . . . . . . . . . . . . . . . . . . . . . . . 32 3.2.2 K*SQLvs. OtherSequenceLanguages . . . . . . . . . . . . 33 3.2.3 MonadicSecondOrderLogic . . . . . . . . . . . . . . . . . 35 3.3 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.3.1 Compile-timeOptimization . . . . . . . . . . . . . . . . . . 36 3.3.2 OptimizationforNestedConstructs . . . . . . . . . . . . . . 38 3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.4.1 XMLqueriesinK*SQL . . . . . . . . . . . . . . . . . . . . 41 3.4.2 QueryExecutionTime . . . . . . . . . . . . . . . . . . . . . 42 3.4.3 NumberofBacktracks . . . . . . . . . . . . . . . . . . . . . 43 3.5 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.6 SummaryofK*SQL . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.7 K*SQLSyntaxandExpressivePower . . . . . . . . . . . . . . . . . 45 3.7.1 K*SQLSyntax . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.7.2 K*SQLforOtherDomains . . . . . . . . . . . . . . . . . . . 46 3.7.3 ProofofTheorem1(Algorithm) . . . . . . . . . . . . . . . . 47 3.7.4 XPathforSequenceQueries . . . . . . . . . . . . . . . . . . 52 3.7.5 FromVPEtoK*SQL . . . . . . . . . . . . . . . . . . . . . . 53 3.7.6 AggregatesandComplexity . . . . . . . . . . . . . . . . . . 57 4 XSeq: High-Performance Complex Event Processing over Hierarchical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.1 XSeqQueryLanguage . . . . . . . . . . . . . . . . . . . . . . . . . 63 v 4.2 AdvancedQueriesfromComplexEventProcessing . . . . . . . . . . 72 4.2.1 StockAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.2.2 SocialNetworks . . . . . . . . . . . . . . . . . . . . . . . . 74 4.2.3 InventoryManagement . . . . . . . . . . . . . . . . . . . . . 75 4.2.4 DirectorySearch . . . . . . . . . . . . . . . . . . . . . . . . 76 4.2.5 Genetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.2.6 Protein,RNAandDNADatabases . . . . . . . . . . . . . . . 78 4.2.7 TemporalQueries . . . . . . . . . . . . . . . . . . . . . . . . 79 4.2.8 SoftwareTraceAnalysis . . . . . . . . . . . . . . . . . . . . 81 4.3 XSeqOptimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.3.1 EfficientQueryPlansviaVPA . . . . . . . . . . . . . . . . . 83 4.3.2 StaticVPAOptimization . . . . . . . . . . . . . . . . . . . . 88 4.3.3 Run-timeVPAOptimization . . . . . . . . . . . . . . . . . . 90 4.4 FormalSemanticsofXSeq . . . . . . . . . . . . . . . . . . . . . . . 91 4.5 ExpressivenessandComplexity . . . . . . . . . . . . . . . . . . . . 96 4.5.1 CXSeq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.5.2 RegularityofCXSeqandComplexity . . . . . . . . . . . . . 101 4.5.3 QueryEvaluationComplexity . . . . . . . . . . . . . . . . . 101 4.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.6.1 EffectivenessofDifferentOptimizations . . . . . . . . . . . . 103 4.6.2 SequenceQueriesvs. XPathEngines . . . . . . . . . . . . . 105 4.6.3 ConventionalQueriesvs. XPathEngines . . . . . . . . . . . 107 4.6.4 ThroughputforDifferentTypesofQueries . . . . . . . . . . 109 vi 4.7 PreviousWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 4.8 SummaryofXSeq . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.9 CoreXSeqProofofRegularity . . . . . . . . . . . . . . . . . . . . . 113 4.9.1 CoreXSeqwithVariableConcatenation . . . . . . . . . . . . 114 4.9.2 CoreXSeqBasic . . . . . . . . . . . . . . . . . . . . . . . . 119 5 Trinity.RDF:ADistributedGraphEngineforWebScaleGraphs . . . . 124 5.1 Joinvs. GraphExploration . . . . . . . . . . . . . . . . . . . . . . . 128 5.1.1 RDFandSPARQL . . . . . . . . . . . . . . . . . . . . . . . 128 5.1.2 UsingJoinOperations . . . . . . . . . . . . . . . . . . . . . 129 5.1.3 UsingGraphExplorations . . . . . . . . . . . . . . . . . . . 130 5.2 SystemArchitecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.3 DataModeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 5.3.1 ModelingGraphs . . . . . . . . . . . . . . . . . . . . . . . . 134 5.3.2 GraphPartitioning . . . . . . . . . . . . . . . . . . . . . . . 135 5.3.3 IndexingPredicates . . . . . . . . . . . . . . . . . . . . . . . 137 5.3.4 BasicGraphOperators . . . . . . . . . . . . . . . . . . . . . 138 5.4 QueryProcessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.4.2 SingleTriplePatternMatching . . . . . . . . . . . . . . . . . 140 5.4.3 MultiplePatternMatchingbyExploration . . . . . . . . . . . 143 5.4.4 FinalJoinafterExploration . . . . . . . . . . . . . . . . . . 145 5.4.5 ExplorationPlanOptimization . . . . . . . . . . . . . . . . . 146 vii 5.4.6 CostEstimation . . . . . . . . . . . . . . . . . . . . . . . . . 151 5.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 5.6 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 5.7 SummaryofTrinity.RDF . . . . . . . . . . . . . . . . . . . . . . . . 165 II APPROXIMATION OPTIMIZATION 166 6 EARL:EarlyAccurateResultsforAdvancedAnalyticsonMapReduce 167 6.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 6.1.1 ExtendingMapReduce . . . . . . . . . . . . . . . . . . . . . 171 6.2 EstimatingAccuracy . . . . . . . . . . . . . . . . . . . . . . . . . . 175 6.2.1 AccuracyEstimationStage . . . . . . . . . . . . . . . . . . . 176 6.2.2 SampleSizeandNumberofBootstraps . . . . . . . . . . . . 177 6.2.3 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 6.2.4 FaultTolerance . . . . . . . . . . . . . . . . . . . . . . . . . 183 6.3 Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 6.3.1 Inter-IterationOptimization . . . . . . . . . . . . . . . . . . 184 6.3.2 Intra-IterationOptimization . . . . . . . . . . . . . . . . . . 187 6.4 CurrentImplementation . . . . . . . . . . . . . . . . . . . . . . . . . 188 6.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 6.5.1 AStrongCaseforEARL . . . . . . . . . . . . . . . . . . . . 191 6.5.2 ApproximateMedianComputation . . . . . . . . . . . . . . 192 6.5.3 EARLandAdvancedMiningAlgorithms . . . . . . . . . . . 193 viii

Description:
5.4.3 Multiple Pattern Matching by Exploration . 143 . 2.1 Tiny examples of nested words in different domains: XML and ge- nomics. 4.1 A query in XPath 2.0/XQuery for a sequence of 'falling price' in Nas- . 7.6 (a) ABM vs. bootstrap on user-defined quality measures for Skewed. TPC-H
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.