Peter Z. Revesz INTRODUCTION TO CONSTRAINT DATABASES With112Illustrations 1 Springer PeterZ.Revesz DepartmentofComputerScienceandEngineering UniversityofNebraska Lincoln,NE68588-0115,USA [email protected] SeriesEditors DavidGries FredB.Schneider DepartmentofComputerScience DepartmentofComputerScience 415BoydStudiesResearchCenter UpsonHall TheUniversityofGeorgia CornellUniversity Athens,GA30605,USA Ithaca,NY14853-7501,USA LibraryofCongressCataloging-in-PublicationData Revesz,PeterZ. Introductiontoconstraintdatabases/PeterZ.Revesz. p. cm.—(Textsincomputerscience) Includesbibliographicalreferencesandindex. ISBN0-387-98729-0(alk.paper) 1. Constraintdatabases. I. Title. II. Series. QA76.9.C67R482001 005.75—dc21 2001041134 Printedonacid-freepaper. c 2002Springer-VerlagNewYork,Inc. (cid:1) Allrightsreserved.Thisworkmaynotbetranslatedorcopiedinwholeorinpartwithoutthewritten permissionofthepublisher(Springer-VerlagNewYork,Inc.,175FifthAvenue,NewYork,NY10010, USA),exceptforbriefexcerptsinconnectionwithreviewsorscholarlyanalysis.Useinconnectionwith anyformofinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilaror dissimilarmethodologynowknownorhereafterdevelopedisforbidden. Theuseofgeneraldescriptivenames,tradenames,trademarks,etc.,inthispublication,evenifthe formerarenotespeciallyidentified,isnottobetakenasasignthatsuchnames,asunderstoodbythe TradeMarksandMerchandiseMarksAct,mayaccordinglybeusedfreelybyanyone. ProductionmanagedbyMichaelKoy;manufacturingsupervisedbyJacquiAshri. Typesetpagespreparedusingtheauthor’sLATEX2εfilesbyIntegreTechnicalPublishingCompany,Inc. PrintedandboundbyHamiltonPrintingCo.,Rensselaer,NY. PrintedintheUnitedStatesofAmerica. 9 8 7 6 5 4 3 2 1 ISBN0-387-98729-0 SPIN10707989 Springer-Verlag NewYork Berlin Heidelberg AmemberofBertelsmannSpringerScience+BusinessMediaGmbH “HoldInfinity inthepalmofyourhand, AndEternityinanhour.” —WilliamBlake:AuguriesofInnocence Preface Purpose and Goals Thistextbookprovidescomprehensivecoverageofconstraintdatabases.Thepri- mary audience of the book is advanced undergraduate and beginning graduate students.Forthemtheextensivesetofexercisesattheendofeachchapterwillbe useful.Thetextandtheexercisesassumeasprerequisiteonlybasicdiscretemath- ematics,linearalgebra,andprogrammingknowledge.Manydatabaseexpertswill alsofindthebibliographicnotesaftereachchapteravaluablereferenceforfurther reading.Forbothstudentsanddatabaseexpertsthesamplesystemsdiscussedin Chapters18–20,aswellassomeslides,areavailablefreefromtheauthor’sWeb page:http://cse.unl.edu/˜revesz Topics Coverage and Organization The dependencies of the chapters are shown in the chart on the next page. The materialcoveredisorganizedintofivebroadcategories: Models:Thematerialcoveredincludesdataabstractionandtherelational • datamodel(Chapter1),theconstraintdatamodel(Chapter2),severalspa- tiotemporaldatamodels(Chapter13.1–3),anddatastorage(Chapter17). Queries: Relational algebra and SQL (Chapter 3), Datalog (Chapter 4), • Datalog with negation and aggregation (Chapter 5), refinement queries (Chapter6),andspatiotemporaldatabasequeries(Chapter13.4). Evaluation: The evaluation issues discussed are safety (Chapter 8), eval- • uation and quantifier-elimination (Chapter 9), computational complexity viii Preface 1 2 13.1–3 Models 17 7 4 3 Queries 5 6 13.4 8 14 Evaluation 9 11 15 10 12 16 Systems 18 19 20 Applications 21 22 23 (Chapter10),certificationofsafety(Chapter11),andimplementationout- linewithdatastructuresandpseudo-codesorkeyalgorithms(Chapter12). Thereisalsocoverageofinteroperabilityamongseveraltypesofconstraint andspatiotemporaldatabases(Chapter14),approximationdatarepresenta- tionandqueryevaluation(Chapter15),anddatavisualization(Chapter16). Systems: The book describes a sample linear constraint database system • (Chapter 18), a Boolean constraint database system (Chapter 19), and a spatiotemporaldatabasesystem(Chapter20). Applications: Sample applications are included in three diverse areas: • computer vision (Chapter 21), bioinformatics (Chapter 22), and environ- mentalmodeling(Chapter23). Aspecialstrengthofthebookisthatitallowsaflexiblecoursedesign.Agood introductorycoursewouldcoversomechaptersfromeachcategory.Forexample, anintroductorycoursefocusedonlinearconstraintdatabasesmaycoverChapters 1,2,3,4,5,8,18,and21.AnintroductorycoursefocusedonBooleanconstraints mayreplacethelastthreeofthesewithChapters6,19,and22. Thetextalsoallowsseveraltypesof advancedcourses.Anadvancedcourse focused on query evaluation may start with a brief review of Chapters 1–5 and Preface ix then cover Chapters 8, 9, 10, 12, and 17. An advanced course focused on spa- tiotemporaldatabasesmaybrieflyreviewChapters1–4andthencoverChapters 13–17,20,and23. Acknowledgments IwouldliketothankBertheChoueiryandManolisKoubarakisforreviewingal- mosttheentirebook,FrenchRoyandJamesVanEttenforreviewingthespecialty chapteronbioinformatics,FranzMoraforreviewingthechapteronenvironmen- talmodeling.Allthechaptersexceptthefirstfivearebasedonmaterialpresented innumerousjournals,conferences,andworkshops.Ithankmycoauthorsonmany ofthosepapers,includingthelateParisKanellakis,myPh.D.thesisadvisor,and GabrielKuper,mytwocoauthorsontheoriginalpaperonconstraintdatabases.I alsothankthefollowingmorerecentcoauthorsfromwhomIlearnedmuch:An- dras Benczur, Jan Chomicki, Floris Geerts, Gosta Grahne, Alberto Mendelzon, Agnes Novak, Raghu Ramakrishnan, Divesh Srivastava, and Jan Van den Buss- che.Ialsothanktheanonymousreviewersofourpapers. Thebookwasusedinmanuscriptformasatextbookforseveralyearsatthe UniversityofNebraska.Iwouldliketothankthefollowingstudentsfortheircom- ments on the text: Brian Boon, Mengchu Cai, Rui Chen, Yi Chen, Ying Deng, DineshKeshwani,LixinLi,andMinOuyang.Inaddition,thefollowingstudents contributed to the material presented in the book via their programming efforts of the constraint database systems described in Chapters 18–20: Jo-Hag Byon, PradipKanjamala,YimingLi,YuguoLiu,PrashanthNandavanam,AndrewSala- mon,YonghuiWang,andLeiZhang.IalsothankPradeepGundavarapuandVijay Eadalafortheirhelpindrawingfiguresorcontributingsomeexamples. IamgratefultotheeditorsatSpringer-Verlag,especiallyWayneWheelerand WayneYuhasz,fortheircarefuleditingandattentiontoseveraldetailsinthepro- duction of the book. The preparation of this book was supported in part by the U.S. National Science Foundation under grants IRI-9625055 and IRI-9632871, byaGallupResearchProfessorship,andbytheUniversityofNebraska-Lincoln. Finally,IwouldliketothankLilla,mywife,forherenthusiasmandpatience duringthewritingofthisbook. Lincoln,Nebraska,USA PeterRevesz Contents Preface vii 1 InfiniteRelationalDatabases 1 1.1 TheViewLevel . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 TheLogicalLevel . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 AbstractDataTypes . . . . . . . . . . . . . . . . . . . . . . . . 8 2 ConstraintDatabases 11 2.1 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 TheConstraintDataModel . . . . . . . . . . . . . . . . . . . . 20 2.3 DataAbstraction . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3 RelationalAlgebraandSQL 26 3.1 RelationalAlgebra . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2 SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4 DatalogQueries 40 4.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.2 DatalogwithSets . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.3 DatalogwithBooleanConstraints . . . . . . . . . . . . . . . . . 46 4.4 DatalogwithAbstractDataTypes . . . . . . . . . . . . . . . . . 46 4.5 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.6 RecursiveDatalogQueries. . . . . . . . . . . . . . . . . . . . . 53 5 AggregationandNegationQueries 56 5.1 SetGrouping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.2 Average,Count,Sum . . . . . . . . . . . . . . . . . . . . . . . 57 5.3 AreaandVolume. . . . . . . . . . . . . . . . . . . . . . . . . . 57 xii Contents 5.4 MinimumandMaximum . . . . . . . . . . . . . . . . . . . . . 58 5.5 Negation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.6 StratifiedDatalogQueries . . . . . . . . . . . . . . . . . . . . . 59 6 ConstraintAutomata 62 6.1 DefinitionofConstraintAutomata. . . . . . . . . . . . . . . . . 62 6.2 SimplificationsofConstraintAutomata . . . . . . . . . . . . . . 68 6.3 AnalysisofReachableConfigurations . . . . . . . . . . . . . . . 73 7 RefinementQueries 79 7.1 TheUniversalObjectRelationDataModel . . . . . . . . . . . . 79 7.2 Closed,Open,andPossibleWorlds . . . . . . . . . . . . . . . . 80 7.3 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.4 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 7.5 ProjectionQueries . . . . . . . . . . . . . . . . . . . . . . . . . 91 7.6 EvaluationofRefinementQueries . . . . . . . . . . . . . . . . . 92 8 SafeQueryLanguages 97 8.1 SafetyLevels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 8.2 Restriction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 8.3 SafeAggregationandNegationQueries . . . . . . . . . . . . . . 101 8.4 SafeRefinementandProjectionQueries . . . . . . . . . . . . . 101 9 EvaluationofQueries 104 9.1 QuantifierEliminationandSatisfiability . . . . . . . . . . . . . 105 9.2 EvaluationofRelationalAlgebraQueries . . . . . . . . . . . . . 116 9.3 EvaluationofSQLQueries . . . . . . . . . . . . . . . . . . . . 119 9.4 EvaluationofDatalogQueries . . . . . . . . . . . . . . . . . . . 120 10 ComputationalComplexity 132 10.1 ComplexityClassesandMeasures. . . . . . . . . . . . . . . . . 132 10.2 ComplexityofRelationalAlgebra . . . . . . . . . . . . . . . . . 133 10.3 ComplexityofDatalog . . . . . . . . . . . . . . . . . . . . . . . 147 10.4 ComplexityofStratifiedDatalog . . . . . . . . . . . . . . . . . 156 11 Certification 159 11.1 ConstantPropagation . . . . . . . . . . . . . . . . . . . . . . . 159 11.2 VariableIndependence . . . . . . . . . . . . . . . . . . . . . . . 160 11.3 Monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 11.4 Acyclicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 11.5 TighteningBounds . . . . . . . . . . . . . . . . . . . . . . . . . 161 11.6 TighteningCoefficients . . . . . . . . . . . . . . . . . . . . . . 161 11.7 VectorAdditionSystem . . . . . . . . . . . . . . . . . . . . . . 162 11.8 PositiveStochasticMatrixMultiplication . . . . . . . . . . . . . 164 Contents xiii 12 ImplementationMethods 168 12.1 EvaluationwithGap-Graphs. . . . . . . . . . . . . . . . . . . . 168 12.2 EvaluationwithMatrices . . . . . . . . . . . . . . . . . . . . . 174 12.3 BooleanConstraints . . . . . . . . . . . . . . . . . . . . . . . . 182 12.4 OptimizationofRelationalAlgebra . . . . . . . . . . . . . . . . 186 13 SpatiotemporalDatabases 195 13.1 ExtremePointDataModels . . . . . . . . . . . . . . . . . . . . 196 13.2 ParametricExtremePointDataModels . . . . . . . . . . . . . . 198 13.3 GeometricTransformationDataModels . . . . . . . . . . . . . 206 13.4 Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 14 Interoperability 219 14.1 DataInteroperability . . . . . . . . . . . . . . . . . . . . . . . . 219 14.2 QueryInteroperability . . . . . . . . . . . . . . . . . . . . . . . 235 14.3 OtherTypesofInteroperability . . . . . . . . . . . . . . . . . . 242 15 ApproximationbyConstraints 246 15.1 TriangulatedIrregularNetworks . . . . . . . . . . . . . . . . . . 247 15.2 PiecewiseLinearApproximationofTimeSeries . . . . . . . . . 250 15.3 ParametricTriangulatedIrregularNetworks. . . . . . . . . . . . 263 15.4 ParametricRectanglesApproximationofRasterMovies . . . . . 265 16 DataVisualization 270 16.1 IsometricColorBands . . . . . . . . . . . . . . . . . . . . . . . 270 16.2 Value-by-AreaCartogram . . . . . . . . . . . . . . . . . . . . . 274 16.3 AnimationofMovingObjects . . . . . . . . . . . . . . . . . . . 280 17 Indexing 286 17.1 MinimumBoundingParametricRectangles . . . . . . . . . . . . 287 17.2 TheParametricR-TreeIndexStructure . . . . . . . . . . . . . . 292 17.3 IndexingConstraintDatabases. . . . . . . . . . . . . . . . . . . 298 18 TheMLPQSystem 302 18.1 TheMLPQDatabaseSystemArchitecture . . . . . . . . . . . . 302 18.2 MLPQInputFiles . . . . . . . . . . . . . . . . . . . . . . . . . 304 18.3 TheMLPQGraphicalUserInterface . . . . . . . . . . . . . . . 307 18.4 RecursiveQueries . . . . . . . . . . . . . . . . . . . . . . . . . 314 19 TheDISCOSystem 319 19.1 DISCOQueries . . . . . . . . . . . . . . . . . . . . . . . . . . 319 19.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 323 19.3 UsingtheDISCOSystem . . . . . . . . . . . . . . . . . . . . . 329 19.4 ExtensibilityoftheDISCOSystem . . . . . . . . . . . . . . . . 330 xiv Contents 20 ThePReSTOSystem 332 20.1 PReSTOInputFiles . . . . . . . . . . . . . . . . . . . . . . . . 332 20.2 ThePReSTOGraphicalUserInterface . . . . . . . . . . . . . . 335 20.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 338 21 ComputerVision 343 21.1 AffineInvariance . . . . . . . . . . . . . . . . . . . . . . . . . . 344 21.2 Affine-InvariantSimilarityMeasures . . . . . . . . . . . . . . . 346 21.3 TheColorRatiosSimilarityMeasure . . . . . . . . . . . . . . . 347 22 Bioinformatics 351 22.1 TheGenomeMapAssemblyProblem . . . . . . . . . . . . . . . 351 22.2 TheBig-BagMatchingProblem . . . . . . . . . . . . . . . . . . 352 22.3 AConstraint-AutomataSolution . . . . . . . . . . . . . . . . . 353 23 EnvironmentalModeling 361 23.1 PredictiveSpreadModeling . . . . . . . . . . . . . . . . . . . . 362 23.2 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366 23.3 ADecisionSupportSystem . . . . . . . . . . . . . . . . . . . . 366 Bibliography 370 Index 390