Replacing GEQO Join ordering via Simulated Annealing Jan Urban´ski [email protected] UniversityofWarsaw/Flumotion May 21, 2010 JanUrban´ski [email protected](UniversityofWarRsaewpla/ciFnlgumGEotQioOn) May21,2010 1/46 For those following at home Getting the slides $ wget http://wulczer.org/saio.pdf Trying it out $ git clone git://wulczer.org/saio.git $ cd saio && make $ psql =# load ’.../saio.so’; Do not try to compile against an assert-enabled build. Do not be scared by lots of ugly code. JanUrban´ski [email protected](UniversityofWarRsaewpla/ciFnlgumGEotQioOn) May21,2010 2/46 1 The problem Determining join order for large queries GEQO, the genetic query optimiser 2 The solution Simulated Annealing overview PostgreSQL specifics Query tree transformations 3 The results Comparison with GEQO 4 The future Development focuses 5 The end JanUrban´ski [email protected](UniversityofWarRsaewpla/ciFnlgumGEotQioOn) May21,2010 3/46 Theproblem Determiningjoinorderforlargequeries Outline 1 The problem Determining join order for large queries GEQO, the genetic query optimiser 2 The solution 3 The results 4 The future 5 The end JanUrban´ski [email protected](UniversityofWarRsaewpla/ciFnlgumGEotQioOn) May21,2010 4/46 Theproblem Determiningjoinorderforlargequeries Getting the optimal join order (cid:73) part of of planning a query is determining the order in which relations are joined (cid:73) it is not unusual to have queries that join lots of relations (cid:73) JOIN and subquery flattening contributes to the number or relations to join (cid:73) automatically generated queries can involve very large joins JanUrban´ski [email protected](UniversityofWarRsaewpla/ciFnlgumGEotQioOn) May21,2010 5/46 Theproblem Determiningjoinorderforlargequeries Problems with join ordering (cid:73) finding the optimal join order is an NP-hard problem (cid:73) considering all possible ways to do a join can exhaust available memory (cid:73) not all join orders are valid, because of: (cid:73) outer joins enforcing a certain join order (cid:73) IN and EXISTS clauses that get converted to joins (cid:73) joins with restriction clauses are preferable to Cartesian joins JanUrban´ski [email protected](UniversityofWarRsaewpla/ciFnlgumGEotQioOn) May21,2010 6/46 Theproblem GEQO,thegeneticqueryoptimiser Outline 1 The problem Determining join order for large queries GEQO, the genetic query optimiser 2 The solution 3 The results 4 The future 5 The end JanUrban´ski [email protected](UniversityofWarRsaewpla/ciFnlgumGEotQioOn) May21,2010 7/46 Theproblem GEQO,thegeneticqueryoptimiser Randomisation helps (cid:73) PostgreSQL switches from exhaustive search to a randomised algorithm after a certain limit (cid:73) GEQO starts by joining the relations in any order (cid:73) and then proceeds to randomly change the join order (cid:73) genetic algorithm techniques are used to choose the cheapest join order JanUrban´ski [email protected](UniversityofWarRsaewpla/ciFnlgumGEotQioOn) May21,2010 8/46 Theproblem GEQO,thegeneticqueryoptimiser Problems with GEQO (cid:73) has lots of dead/experimental code (cid:73) there is a TODO item to remove it (cid:73) nobody really cares about it (cid:73) is an adaptation of an algorithm to solve TSP, not necessarily best suited to join ordering (cid:73) requires some cooperation from the planner, which violates abstractions JanUrban´ski [email protected](UniversityofWarRsaewpla/ciFnlgumGEotQioOn) May21,2010 9/46 Thesolution SimulatedAnnealingoverview Outline 1 The problem 2 The solution Simulated Annealing overview PostgreSQL specifics Query tree transformations 3 The results 4 The future 5 The end JanUrban´ski [email protected](UniversityofWarRsaewpla/ciFnlgumGEotQioOn) May21,2010 10/46
Description: