Conceptual Database Design AN ETY-REATINSI APPROACH BATINI CERI N AXVJAT H E CONCEPTUAL DATABASE DESIGN An Entity-Relationship Approach F_ Selected Titles from The Benjamin/Cummings Series in Computer Science G. S. Almasi, T. J. Watson, and A. Gottlieb Highly Parallel Computing, Second Edition (1992) G. Andrews Concurrent Programming: Principles and Practice (1991) G. Andrews and R. A. Olsson Concurrent Programming with SR (1992) P. Atzeni, V. De Antonellis, and C. Batini Relational Database Theory: A Comprehensive Introduction (1992) G. Booch Object-Oriented Design with Applications (1991) R. Elmasri and S. B. Navathe Fundamentals of Database Systems, Second Edition (1992) L. Kerschberg, Editor Expert Database Systems: Proceedings from the First International Workshop (1986) L. Kerschberg, Editor Expert Database Systems: Proceedings from the First International Conference (1988) L. Kerschberg, Editor Expert Database Systems: Proceedings from the Second International Workshop (1989) F. R. McFadden and J. A. Hoffer Database Management, Third Edition (1991) w CONCEPTUAL DATABASE DESIGN An Entity-Relationship Approach CARLO BATINI University di Roma "La Sapienza" STEFANO CERI Politecnico di Milano SHAMKANT B. NAVATHE Georgia Institute of Technology The Benjamin/Cummings Publishing Company, Inc. Redwood City, California • Menlo Park, California • Reading, Massachusetts New York • Don Mills, Ontario • Wokingham, U.K. • Amsterdam Bonn • Sydney • Singapore • Tokyo • Madrid • San Juan Sponsoring editor: Alan Apt Production supervisor: Laura Kenney Copyeditor: Nicholas Murray Technical artist: Merry Finley Cover designer: Gary Head Production management: The Cowans Composition: The Cowans Copyright © 1992 by Carlo Batini, Stefano Ceri, and Shamkant B. Navathe. Library of Congress Cataloging-in-Publication Data Batini, Carlo. Conceptual database design: an entity-relationship approach/ Carlo Batini, Stefano Ceri, Shamkant B. Navathe. 496 p. cm. Includes index. 1. Data base design. I. Ceri, Stefano, II. Navathe, Shamkant III. Title. QA76.9.D26B38 199 005.74--dc2O 91-15635 ISBN 0-8053-0244-1 345678910-DO-95 94 The Benjamin/Cummings Publishing Company, Inc. 390 Bridge Parkway Redwood City, California 94065 To my parents: Curzio, who died in 1975, and Laura, who died fifteen years later C. B. To my dear parents: Luciana Arcidiacono and Mauro Ceri S.C. To my parents: Blialchandra and Vijaya Navathe, for their encouragement and support S. B. N. PREFACE Background Database design is the process of determining the organization of a database, including its structure, contents, and the applications to be run. For a long time, database design was considered a task for experts, and was regarded as more of an art than a science. However, much progress has been made in database design, and it is now considered a stable discipline, with its own methods and techniques. Due to the spread of databases in industry and government spanning commercial and a variety of scientific and technical applications, database design is playing a central role in the information resource manage- ment of most organizations. Database design has also become a part of the general background of computer scientists, much like the ability of building algorithms using a conventional programming language. Database design is normally done in three phases. The first phase, called conceptual design, produces a high-level, abstract representation of reality. The second phase, called logical design, translates this representation into specifications that can be implemented on and processed by a computer system. The third phase, called physical design, determines the physical storage structures and access methods required for efficient access to the contents of a database from secondary storage devices. This book deals with the first two phases of database design, with a strong focus on the issues related to the user and the application rather than the system and a specific hardware/software environment. The conceptual and logical design phases can be per- formed independently of the choice of a particular database management system (DBMS). Therefore, it is assumed in the text that the knowledge of general database concepts and/ or experience with DBMSs constitutes a common background for the reader. We believe that a systematic and thorough execution of these early phases of design pays off immensely in the long term. In fact, many organizations are discovering the need to do conceptual and logical design at the same time as they move over to the relational and object-oriented database technology. vii viii PREFACE In this book, we use Chen's Entity-Relationship (ER) model with some enhancements needed for a better conceptual representation. This model is extensively used in many design methodologies, has an effective graphic representation, and is the de facto standard of most automatic tools for supporting database design. Though this book focuses on conceptual database design, we present a joint methodology for conceptual database design and functional analysis. The proposed mixed approach is based on well-known techniques that are common to both approaches. Purpose of This Book The primary goals of this book are the following: "* To provide a thorough and systematic treatment of conceptual and logical design. "* To base this treatment on the well-accepted Entity-Relationship model. "* To advocate that conceptual design and functional analysis be conducted to- gether. "* To address completely the translation of the conceptual design in the Entity- Relationship model into the three popular data models: relational, network, and hierarchical. We also address the problem of reverse engineering from these three models into the ER model. "*T o illustrate the concepts via a realistic, large case study. "*T o provide a survey of the state of the art of design tools. "* To provide enough pedagogical support for students of this subject in terms of exercises and bibliographic notes on pertinent literature. Audience The main activity of conceptual design is to understand and model reality; this task is difficult and is usually performed only by experts. Once it is captured, the logical design task is fairly straight forward. The main objective of this book is to discuss conceptual design not only for the benefit of the experts, but to introduce it to a much broader audience: 1. Students, who require a precise, rigorous treatment of conceptual and logical database design to complement a first course on database models and systems. 2. Practitioners (database administrators, analysts, consultants, and database pro- grammers), who will use this material to formalize and solve database design problems that are typically ill-defined. We believe that the methodology presented in this book can be adapted to apply to most design situations and hence will help designers solve their design problems in a systematic way. PREFACE ix 3. Users of databases, who need a foundation of knowledge to communicate with the database administration staff in order to specify their needs; they will also be able to monitor and control the design process and to understand the meaning and the structure of the database stored in their information system. This book is self-contained: all concepts are defined before their use. However, the book does not include a description of database systems features or of languages that can be used for programming database systems. Hence, it assumes as a prerequisite some knowledge about database systems, typically obtained from a first course on databases, or from exposure to the concrete use of databases. We recommend the use of Fundamentals of Database Systems by Elmasri and Navathe (Benjamin/Cummings, 1989) as a compre- hensive source for reference material. Chapters 12, 13, and 14 provide summary intro- ductions to the relational, network, and hierarchical data models, respectively. Outline of the Book The book is divided into three parts, preceded by an introductory chapter. The last part concludes with a guest chapter by David Reiner on design tools. Part 1 follows a data-driven approach and treats conceptual database design as independent of application design. The first chapter illustrates the role of database design within the information systems life cycle and the distinction between data- and function-driven approaches to informa- tion systems design. Chapter 2 presents data modeling concepts and specifically the ER model, so that after reading it the reader is able to understand ER schemas. Chapter 3 presents design primitives and strategies for designing ER schemas; at the end of this chapter, the reader should be able to build small ER schemas. Chapter 4 is divided into three sections, each one illustrating specific approaches to conceptual design based on different types of initial requirements: textual descriptions, forms, and COBOL record formats. Each section can be read independently. Chapter 5 describes how different schemas should be integrated to generate a unique global schema. Chapter 6 shows how a conceptual schema should be restructured in order to improve its qualities (including completeness, minimality, expressiveness, readability, and normal- ization). Chapter 7 describes how conceptual design should be documented by collecting several design descriptions, and how such documentation can be used for database maintenance and for data dictionary integration. Part 2 follows a joint data- and function-driven approach and integrates conceptual modeling with functional analysis. Chapter 8 deals with functional analysis by introducing the dataflow model and by showing design primitives and strategies. Chapter 9 illustrates the joint data- and function-driven approach to conceptual design of data and functions; this method produces a high-level navigational specification of databaseo perations that are useful for the subsequent logical and physical database design. Chapter 10 presents a large case study. We present a fairly realistic example of a bus company that offers a variety of tours. We have tried to address the aspects of the bus company operations that are relevant to the design of the database. X PREFACE Part 3 of the book addresses logical design, which is the process of converting the conceptual design into an implementable database structure in some specific database management system. We first address model-independent design; that is, we consider simplifying transformations on the conceptual schema without any regard to the final target data model. Then we consider mapping the conceptual schema into each of the three prominent data models: relational, network, and hierarchical. Chapter 11 deals with model-independent logical design using the Entity-Relationship model, and describes the initial transformations of the conceptual schema into a simpli- fied, intermediate, conceptual-to-logical schema. The next three chapters use this simplified schema as a starting point for subsequent transformations into the commercially imple- mented dominant families of DBMSs. Chapters 12 through 14 transform a conceptual schema in the ER model into a re- lational schema, a network (DBTG or CODASYL) schema, and a hierarchical schema re- spectively. In these three chapters we have summarized the essential features, constraints, and languages associated with these models. In each of the above chapters, two additional issues are addressed. First, we consider the translation of operations on the conceptual schema into appropriate data-manipulation languages for the target data models (relational, network, or hierarchical). These are illustrated through example operations on the case- study database. Second, the problem of reverse engineering is addressed for each model, so that existing database schemas in the respective models may be abstracted or reverse engineered into a conceptual schema. Chapter 15 on database design tools is contributed by David Reiner. It first discusses the general issues, including the architecture and desirable features of design tools, and then describes some of the tools for computer-assisted database design currently available as research prototypes or on the commercial market. The layout of the book, shown in the accompanying chart, indicates precedences among chapters and suggests several reading sequences. Acknowledgments Our approach to database design has been strongly influenced by the Dataid Project, developed in Italy between 1980 and 1985, and sponsored by the National Research Council. We thank our colleagues who actively participated in the project, particularly Antonio Albano, Valeria de Antonellis, and Antonio di Leva. Each of the three authors has separately conducted research in database design, together with other colleagues. In particular, Carlo Batini wishes to acknowledge Maurizio Lenzerini, Giuseppe Di Battista, and Giuseppe Santucci. Stefano Ceri wishes to acknowledge the cooperation of Giampio Bracchi, Giuseppe Pelagatti, Barbara Pernici, Paola Mostacci, and Federico Barbic during the years of activity of the Dataid Project, sponsored by the National Research Council and by a joint grant with the National Science Foundation. Many students of the Politecnico di Milano have conducted research in database design with him as part of their theses; he would like to