ebook img

Concurrency control in distributed database systems PDF

344 Pages·1989·14.6 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Concurrency control in distributed database systems

STUDIES IN COMPUTER SCIENCE 3 AND ARTIFICIAL INTELLIGENCE Editors: H. Kobayashi IBM Japan Ltd. Tokyo M. Ni vat Université Paris VII Paris NORTH-HOLLAND -AMSTERDAM · NEW YORK · OXFORD · TOKYO CONCURRENCY CONTROL IN DISTRIBUTED DATABASE SYSTEMS WojciechCELLARY Politechnika Poznanska Poznan, Poland ErolGELENBE École des Hautes Études en Informatique Université René Descartes, Paris, France TadeuszMORZY Politechnika Poznanska Poznan, Poland 1988 NORTH-HOLLAND -AMSTERDAM · NEW YORK · OXFORD · TOKYO © Elsevier Science Publishers B.V., 1988 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, withoutthe prior permission of the copyright owner. ISBN: 0 444 70409 4 Publishers: ELSEVIER SCIENCE PUBLISHERS B.V. P.O. BOX 1991 1000 BZ AMSTERDAM THE NETHERLANDS Sole distributors for the U. S.A. and Canada: ELSEVIER SCIENCE PUBLISHING COMPANY, INC. 52VANDERBILT AVENUE NEW YORK, N.Y. 10017 U.S.A. PRINTED IN THE NETHERLANDS To those whose patience and encouragement simplified our task: Daromira, Kasia, Marcin and Przemko Deniz, Pamir and Pasha Anna and Mikolaj Preface In recent years research and practical applications in the area of distri- buted systems have developed rapidly, stimulated by several factors. In the first place, this is a consequence of the significant progress in two fields of computer science, namely computer networks and database systems. These ares have constituted the technical and scientific foundations for the development of distributed systems. This rapid development is also the result of the need for such systems for management and control applications. Distributed computer systems arise mainly because of the distributed nature of many engineering and management systems such as banking systems, systems for production line automation, service systems, reservation systems, inventory systems, infor- mation retrieval systems, military systems, etc. The distributed nature of these applications is better satisfied by distributed computer systems than by centralized configurations. The third motivation for distributed systems is of economic nature. The development of VLSI circuit technology has considerably reduced the cost of computer equipment, and changed the orders of magnitude of its price/perfor- mance ratio. Present day "small" computers offer at far lower cost many of the capabilities which were previously provided only by large mainframes. Thus it has become possible to construct distributed computer systems con- sisting of many small yet powerful coupled computers instead of installing one large mainframe. This evolution of computer technology has also lowered the relative cost of computing versus communication and has provided high speed local area networks. The cost of communication facilities is now comparable with that of powerful mini-computers. Therefore, again it is worth constructing a distributed system in which computers interchange computation results be- tween them via communication links instead of installing one large main- frame in which data is gathered via communication links. XI xii Preface Another attractive aspect of distributed systems is the possibility of sim- pler software design. Individual processors can thus be dedicated to par- ticular functions of the system leading to the elimination of very complex multiprogramming software usually associated with large mainframes. Furthermore, the most important indices characterizing the quality of computer systems can be improved by distributed systems. In particular they allow: • increased reliability and accessibility of the system due to the physical replication and distribution of computer resources (i.e. data, comput- ing power, etc.), since the crash of a single site does not necessarily affect the other sites and does not lead to the unavailability of the whole system; • better system performance as a consequence of the increased level of parallel processing, also obtained by bringing the system resources closer to data sources and users; • increased flexibility of the system resulting from its modular and open structure which allows growth and smoother change of functions and capacity; • increased data security due to better protection in the case of hardware and software failures or attempts to destroy data. Some of the most advanced types of distributed systems are Distributed Database Systems (DDBS) which may be defined as integrated database systems composed of autonomous local databases geographically distributed and interconnected by a computer network. Research in the field of DDBS s is experiencing rapid growth ever since the mid seventies. At present DDBS s are in the initial stages of com- mercialization. Experimental DDBS s such as SDD-1, R*, SIRIUS-DELTA, Distributed-Ingres, DDM and POREL have been tested and evaluated. Some commercial systems such as ENCOMPASS from Tandem, and CICS/ISC from IBM, are already available. New problems arise in distributed database systems, in comparison with centralized database systems, with respect to their management. A DDBS is managed by a Distributed Database Management System (DDBMS) whose main task is to give the users a "transparent" view of the distributed struc- ture of the database, i.e. the illusion of having a monolithic and centralized Preface xiii database at their disposal. Distribution transparency, i.e. location and repli- cation transparency, implies that the conceptual and external-level problems (using the ANSI/SPARC terminology) of distributed databases do not es- sentially differ from similar issues in centralized database systems. On the other hand, the internal-level problems, i.e. physical database design and DDBS management, are specific and qualitatively new. The main issues in DDBS management can be classified in three principal groups: concurrency control, query processing optimization, and reliability. Solutions to these problems in a centralized environment are inappropri- ate for a distributed environment because of differences in the internal-level structure of the databases. Their effective solution conditions the possibility of taking full advantage of DDBS structure and applications. The fundamental problem facing the designers of DDBMSs is that of the correct control of concurrent access to the distributed database by many different users. This can be viewed as the design of an appropriate concur- rency control algorithm. The construction of concurrency control algorithms is of key importance to the whole management of distributed databases. A solution of this problem will influence the solution of the two remaining issues. The general aim of a concurrency control algorithm is to ensure consis- tency of the distributed database and the correct completion of each tran- saction initiated in the system. An obvious additional requirement is to minimize overhead and transaction response time, and to maximize DDBSs throughput. In the study of concurrency control in DDBSs three successive phases can be distinguished. Initially, there has been an attempt to adopt concur- rency control algorithms designed for multiaccess but centralized database systems. This attempt was not successful for the following reasons. In a DDBS every transaction can request simultaneous access to many local databases located on physically dispersed computer sites. Thus, the concur- rency control problem for DDBSs is more general than the similar problem for centralized database systems. Moreover, in centralized database systems concurrency control has to ensure the internal consistency of one single lo- cal database. In DDBSs concurrency control has to guarantee the internal consistency of several different local databases and the external consistency of the distributed database understood as the identity of copies of the data items. Furthermore, in DDBSs no computer site will in general hold full infor- mation on the global state of the whole system. Hence, all control decisions XIV Preface taken at a site of a DDBS have to be made on the basis of incomplete and not entirely up-to-date information on the activities of the remaining sites. This fact must be taken into consideration in every concurrency control method. In a more recent past, three basic methods were designed in relation to the syntactic model of concurrency control, i.e the model in which no se- mantic information on transactions or data is assumed. These are locking, timestamp ordering and validation. Studies related to the syntactic model of concurrency control are still of interest. At present, research focuses on inte- grating the basic concurrency control methods and the construction of hybrid methods. Algorithms using hybrid methods, e.g. the bi-ordering locking al- gorithm, guarantee better performance and eliminate system performance failures (deadlock, permanent blocking, and cyclic and infinite restarting), which can prevent some transactions from completing. The global resolu- tion of the problems of DDBS consistency and DDBS performance failures is one of the major advantages of hybrid methods. In the next phase of studies on concurrency control problems a multiver- sion data model was assumed. In this model every data item is a sequence of versions created as a result of successive updates. Thus, the history of data updates is stored in the database. Multiversion DDBS s are attractive to DDBS designers for several reasons. They allow a higher degree of concur- rency, they can be combined with reliability mechanisms in a natural way, and they can be easily designed so that no queries are delayed or rejected. In recent years there has been a new trend in research characterized by a shift from the syntactic model of concurrency control to the semantic model. Semantic information can concern data (e.g. physical structure of the database, consistency constraints, etc.) or the set of transactions. At present only preliminary results in this area are available. However this trend seems to be very promising and will presumably provide significant results in the near future. The purpose of this monograph is to present DDBS concurrency control algorithms and their related performance issues. The most recent results have been taken into consideration. A detailed analysis and selection of these results has been made so as to include those which, in the authors' opinion, will promote applications and progress in the field. It can also be said that the application of the methods and algorithms presented in the book is not limited to DDBSs but also relates to centralized database systems and to database machines which can often be considered as particular examples of DDBSs . Preface xv The book is intended primarily for DDBMS designers, but can also be of use to those who are engaged in the design and management of databases in general, as well as in problems of distributed system management such as distributed operating systems, computer networks, etc. This text consists of five parts. Part I is devoted to basic definitions and models. In Chapter 1 a model of DDBSs is presented and its components, the distributed database model and the transaction model, are discussed. Distributed database consistency is introduced next. This chapter ends with a description of DDBS architecture. In Chapter 2 definitions are given of syntactic and semantic concurrency control models. For the syntactic model the serializability criterion of transaction execution correctness is discussed in relation to both mono- and multiversion DDBSs . Garcia-Molina's and Lynch's approaches are presented for the semantic model. Chapter 3 covers issues related to DDBS performance failures: deadlock, permanent blocking, cyclic and infinite restarting. In Part II, Chapters 4, 5, 6, and 7 discuss concurrency control methods in monoversion DDBSs : the locking method, the timestamp ordering method, the validation method and hybrid methods. For each method the concept, the basic algorithms, a hierarchical version of the basic algorithms, and methods for avoiding performance failures are given. In Part III, Chapters 8, 9, 10 cover concurrency control methods in mul- tiversion DDBSs : the multiversion locking method, the multiversion times- tamp ordering method and the multiversion validation method. Concurrency control methods for the semantic concurrency model are given in Part IV. In Chapter 11, Garcia-Molina's locking algorithm which uses the semantic criterion of transaction execution correctness is presented and discussed. Chapter 12 is devoted to a locking algorithm which uses the abstract data type approach. Part V is composed of five chapters concerning performance issues. Chap- ter 13 presents a general statement of the issue of performance in a DDBS with respect to the service received by a particular transaction. Chapter 14 discusses the effect of concurrency control algorithms in general on the DDBS's transaction processing capacity. Chapter 15 is devoted to the per- formance evaluation of global locking policies, while Chapter 16 analyses the performance issues related to locking policies based on individual data items or granules. Finally, in Chapter 17 recent results on the performance of re- sequencing such as timestamp ordering algorithms are presented. The whole of Part V uses the performance evaluation methodology based on queuing models and simulation tools. xvi Preface The book concludes with a comprehensive bibliography on the subject. Acknowledgements We would like to thank Dr. Geneviève Jomier from the University of Paris Sud (Orsay) in France and Prof. Jan Wçglarz from the Technical Uni- versity of Poznan in Poland who have helped in organizing the cooperation which made this book possible. We also thank those who have contributed to the preparation of the ca- mera ready form of this book: Catherine Vinet, Marisela Hernandez, Gilbert Harrus, Jerzy Strojny and Michal Jankowski.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.