J O E C E L K O’S THINKING IN SETS The Morgan Kaufmann Series in Data Management Systems Enterprise Knowledge Management Advanced SQL: 1999—Understanding Object- A Complete Guide to DB2 Universal Database David Loshin Relational and Other Advanced Features Don Chamberlin Jim Melton Business Process Change, Second Edition Universal Database Management: A Guide to Paul Harmon Database Tuning: Principles, Experiments, Object /Relational Technology and Troubleshooting Techniques Cynthia Maro Saracco IT Manager’s Handbook, Second Edition Dennis Shasha and Philippe Bonnet Bill Holtsnider and Brian Jaffe Readings in Database Systems, Third Edition SQL:1999—Understanding Relational Edited by Michael Stonebraker and Joseph M. Joe Celko’s Puzzles and Answers, Second Edition Language Components Hellerstein Joe Celko Jim Melton and Alan R. Simon Understanding SQL’ s Stored Procedures: A Making Shoes for the Cobbler’s Children: Information Visualization in Data Mining and Complete Guide to SQL /PSM Using Architecture and Patterns to Enable IT Knowledge Discovery Jim Melton Governance Edited by Usama Fayyad, Georges G. Charles Betz Principles of Multimedia Database Systems Grinstein, and Andreas Wierse V. S. Subrahmanian Java Data Mining: Strategy, Standard, and Transactional Information Systems: Theory, Practice Principles of Database Query Processing for Algorithms, and Practice of Concurrency Control Mark Hornik, Erik Marcade, and Sunil Advanced Applications and Recovery Venkayala Clement T. Yu and Weiyi Meng Gerhard Weikum and Gottfried Vossen Joe Celko’s Analytics and OLAP in SQL Spatial Databases: With Application to GIS Advanced Database Systems Joe Celko Philippe Rigaux, Michel Scholl, and Agnes Voisard Carlo Zaniolo, Stefano Ceri, Christos Faloutsos, Richard T. Snodgrass, V. S. Subrahmanian, and Data Preparation for Data Mining Using SAS Information Modeling and Relational Databases: Roberto Zicari Mamdouh Refaat From Conceptual Analysis to Logical Design Principles of Transaction Processing Querying XML: XQuery, XPath, and SQL/XML Terry Halpin Philip A. Bernstein and Eric Newcomer in Context Component Database Systems Jim Melton and Stephen Buxton Edited by Klaus R. Dittrich and Andreas Using the New DB2: IBMs Object-Relational Database System Data Mining: Concepts and Techniques, Geppert Don Chamberlin Second Edition Managing Reference Data in Enterprise Jiawei Han and Micheline Kamber Databases: Binding Corporate Data to the Distributed Algorithms Nancy A. Lynch Database Modeling and Design: Logical Design, Wider World Fourth Edition Malcolm Chisholm Active Database Systems: Triggers and Rules for Toby J, Teorey, Sam S. Lightstone, and Understanding SQL and Java Together: A Guide Advanced Database Processing Thomas P. Nadeau to SQLJ, JDBC, and Related Technologies Edited by Jennifer Widom and Stefano Ceri Foundations of Multidimensional and Metric Jim Melton and Andrew Eisenberg Migrating Legacy Systems: Gateways, Interfaces, & the Incremental Approach Data Structures Database: Principles, Programming, and Michael L. Brodie and Michael Stonebraker Hanan Samet Performance, Second Edition Joe Celko’s SQL for Smarties: Advanced SQL Patrick and Elizabeth O’Neil Atomic Transactions Nancy Lynch, Michael Merritt, William Programming, Third Edition The Object Data Standard: ODMG 3.0 Weihl, and Alan Fekete Joe Celko Edited by R. G. G. Cattell and Douglas K. Moving Objects Databases Barry Query Processing for Advanced Database Systems Edited by Johann Christoph Freytag, David Ralf Hartmut Güting and Markus Schneider Data on the Web: From Relations to Semi- Maier, and Gottfried Vossen structured Data and XML Joe Celko’s SQL Programming Style Serge Abiteboul, Peter Buneman, and Dan Suciu Transaction Processing: Concepts and Techniques Joe Celko Jim Gray and Andreas Reuter Data Mining: Practical Machine Learning Tools Data Mining, Second Edition: Concepts and and Techniques with Java Implementations Building an Object-Oriented Database System: Techniques Ian Witten and Eibe Frank Ian Witten and Eibe Frank The Story of O2 Edited by François Bancilhon, Claude Delobel, Joe Celko’s SQL for Smarties: Advanced SQL Fuzzy Modeling and Genetic Algorithms for and Paris Kanellakis Programming , Second Edition Data Mining and Exploration Joe Celko Database Transaction Models for Advanced Earl Cox Applications Joe Celko’s Data and Databases: Concepts in Practice Data Modeling Essentials, Third Edition Edited by Ahmed K. Elmagarmid Joe Celko Graeme C. Simsion and Graham C. Witt A Guide to Developing Client/Server SQL Developing Time-Oriented Database Location-Based Services Applications Applications in SQL Jochen Schiller and Agnès Voisard Setrag Khoshafi an, Arvola Chan, Anna Wong, Richard T. Snodgrass Database Modeling with Microsoft® Visio for and Harry K. T. Wong Web Farming for the Data Warehouse Enterprise Architects The Benchmark Handbook for Database and Richard D. Hackathorn Terry Halpin, Ken Evans, Patrick Hallock, Transaction Processing Systems, Second Edition and Bill Maclean Management of Heterogeneous and Autonomous Edited by Jim Gray Database Systems Designing Data-Intensive Web Applications Camelot and Avalon: A Distributed Transaction Edited by Ahmed Elmagarmid, Marek Stephano Ceri, Piero Fraternali, Aldo Bongio, Facility Rusinkiewicz, and Amit Sheth Marco Brambilla, Sara Comai, and Maristella Edited by Jeffrey L. Eppinger, Lily B. Mummert, Matera Object-Relational DBMSs: Tracking the Next and Alfred Z. Spector Great Wave, Second Edition Mining the Web: Discovering Knowledge from Readings in Object-Oriented Database Systems Michael Stonebraker and Paul Brown, with Hypertext Data Edited by Stanley B. Zdonik and David Maier Dorothy Moore Soumen Chakrabarti J O E C E L K O’S THINKING IN SETS Auxiliary, Temporal, and Virtual Tables in SQL Joe Celko AMSTERDAM (cid:129) BOSTON HEIDELBERG (cid:129) LONDON NEW YORK (cid:129) OXFORD (cid:129) PARIS (cid:129) SAN DIEGO SAN FRANCISCO (cid:129) SINGAPORE (cid:129) SYDNEY (cid:129) TOKYO Morgan Kaufmann is an imprint of Elsevier Publisher Denise E. M. Penrose Publishing Services Manager George Morrison Project Manager Marilyn E. Rash Assistant Editor Mary E. James Production Management Multiscience Press, Inc. Design Direction Joanne Blank Cover Design Dick Hannus Typesetting/Illustrations diacriTech Interior Printer Sheridan Books Cover Printer Phoenix Color Corp. Morgan Kaufmann Publishers is an imprint of Elsevier. 30 Corporate Drive, Burlington, MA 01803-4255 This book is printed on acid-free paper. Copyright © 2008 by Elsevier Inc. All rights reserved. Designations used by companies to distinguish their products are often claimed as trademarks or registered trademarks. In all instances in which Morgan Kaufmann Publishers is aware of a claim, the product names appear in initial capital or all capital letters. Readers, however, should contact the appropriate companies for more complete information regarding trademarks and registration. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photocopying, scanning, or otherwise—without prior written permission of the publisher. Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, e-mail: [email protected]. You may also complete your request on-line via the Elsevier homepage (http://elsevier.com), by selecting “Support & Contact” then “Copyright and Permission” and then “Obtaining Permissions.” Library of Congress Cataloging-in-Publication Data Celko, Joe. [Thinking in sets] Joe Celko’s thinking in sets : auxiliary, temporal, and virtual tables in SQL / Joe Celko. p. cm. Includes index. ISBN 978-0-12-374137-0 (alk. paper) 1. SQL (Computer program language) 2. Declarative programming. I. Title. II. Title: Thinking in sets. QA76.73.S67C463 2008 005.13—dc22 2007043898 For information on all Morgan Kaufmann publications, visit our Web site at www.mkp.com or www.books.elsevier.com. Printed in the United States 08 09 10 11 12 10 9 8 7 6 5 4 3 2 1 Working together to grow libraries in developing countries www.elsevier.com | www.bookaid.org | www.sabre.org To my god-niece, Olivia. When we are fi nished with her board books, we can start on SQL manuals before bedtime! This page intentionally left blank CONTENTS Preface xvii 1 SQL Is Declarative, Not Procedural 1 1.1 Different Programming Models 2 1.2 Different Data Models 4 1.2.1 Columns Are Not Fields 5 1.2.2 Rows Are Not Records 7 1.2.3 Tables Are Not Files 11 1.2.4 Relational Keys Are Not Record Locators 13 1.2.5 Kinds of Keys 15 1.2.6 Desirable Properties of Relational Keys 17 1.2.7 Unique But Not Invariant 18 1.3 Tables as Entities 19 1.4 Tables as Relationships 20 1.5 Statements Are Not Procedures 20 1.6 Molecular, Atomic, and Subatomic Data Elements 21 1.6.1 Table Splitting 21 1.6.2 Column Splitting 22 1.6.3 Temporal Splitting 24 1.6.4 Faking Non-1NF Data 24 1.6.5 Molecular Data Elements 25 1.6.6 Isomer Data Elements 26 1.6.7 Validating a Molecule 27 2 Hardware, Data Volume, and Maintaining Databases 29 2.1 Parallelism 30 2.2 Cheap Main Storage 31 2.3 Solid-State Disk 32 2.4 Cheaper Secondary and Tertiary Storage 32 2.5 The Data Changed 33 2.6 The Mindset Has Not Changed 33 viii CONTENTS 3 Data Access and Records 37 3.1 Sequential Access 38 3.1.1 Tape-Searching Algorithms 38 3.2 Indexes 39 3.2.1 Single-Table Indexes 40 3.2.2 Multiple-Table Indexes 40 3.2.3 Type of Indexes 41 3.3 Hashing 41 3.3.1 Digit Selection 42 3.3.2 Division Hashing 42 3.3.3 Multiplication Hashing 42 3.3.4 Folding 42 3.3.5 Table Lookups 43 3.3.6 Collisions 43 3.4 Bit Vector Indexes 44 3.5 Parallel Access 44 3.6 Row and Column Storage 44 3.6.1 Row-Based Storage 44 3.6.2 Column-Based Storage 45 3.7 JOIN Algorithms 46 3.7.1 Nested-Loop Join Algorithm 47 3.7.2 Sort-Merge Join Method 47 3.7.3 Hash Join Method 48 3.7.4 Shin’s Algorithm 48 4 Lookup Tables 51 4.1 Data Element Names 52 4.2 Multiparameter Lookup Tables 55 4.3 Constants Table 56 4.4 OTLT or MUCK Table Problems 59 4.5 Defi nition of a Proper Table 62 5 Auxiliary Tables 65 5.1 Sequence Table 65 5.1.1 Creating a Sequence Table 67 CONTENTS ix 5.1.2 Sequence Constructor 68 5.1.3 Replacing an Iterative Loop 69 5.2 Permutations 72 5.2.1 Permutations via Recursion 72 5.2.2 Permutations via CROSS JOIN 73 5.3 Functions 75 5.3.1 Functions without a Simple Formula 76 5.4 Encryption via Tables 78 5.5 Random Numbers 79 5.6 Interpolation 83 6 Views 87 6.1 Mullins VIEW Usage Rules 88 6.1.1 Effi cient Access and Computations 88 6.1.2 Column Renaming 89 6.1.3 Proliferation Avoidance 90 6.1.4 The VIEW Synchronization Rule 90 6.2 Updatable and Read-Only VIEWs 91 6.3 Types of VIEWs 93 6.3.1 Single-Table Projection and Restriction 93 6.3.2 Calculated Columns 93 6.3.3 Translated Columns 94 6.3.4 Grouped VIEWs 95 6.3.5 UNIONed VIEWs 96 6.3.6 JOINs in VIEWs 98 6.3.7 Nested VIEWs 98 6.4 Modeling Classes with Tables 100 6.4.1 Class Hierarchies in SQL 100 6.4.2 Subclasses via ASSERTIONs and TRIGGERs 103 6.5 How VIEWs Are Handled in the Database System 103 6.5.1 VIEW Column List 104 6.5.2 VIEW Materialization 104 6.6 In-Line Text Expansion 105 6.7 WITH CHECK OPTION Clause 106 6.7.1 WITH CHECK OPTION as CHECK( ) Clause 110 6.8 Dropping VIEWs 112