ebook img

Distributed k-ary System PDF

209 Pages·2007·1.56 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Distributed k-ary System

Distributed k-ary System: Algorithms for Distributed Hash Tables ALI GHODSI A Dissertation submitted to the Royal Institute of Technology (KTH) in partial fulfillment of the requirements for the degree of Doctor of Philosophy December 2006 The Royal Institute of Technology (KTH) School of Information and Communication Technology Department of Electronic, Computer, and Software Systems Stockholm, Sweden TRITA-ICT/ECS AVH 06:09 ISSN 1653-6363 ISRN KTH/ICT/ECS AVH-06/09–SE and SICS Dissertation Series 45 ISSN 1101-1335 ISRN SICS-D–45–SE c Ali Ghodsi, 2006 (cid:13) Abstract This dissertation presents algorithms for data structures called distributed hash tables (DHT) or structured overlay networks, which are used to build scalable self-managing distributed systems. The provided algorithms guarantee lookup consistency in the presence of dynamism: they guarantee con- sistentlookupresultsinthepresenceofnodesjoiningandleaving. Similarly, the algorithms guarantee that routing never fails while nodes join and leave. Previ- ousalgorithmsforlookupconsistencyeithersufferfromstarvation,donotwork in the presenceof failures, or lack proof of correctness. Several group communication algorithms for structured overlay networks are presented. We provide an overlay broadcast algorithm, which unlike previ- ous algorithms avoids redundantmessages,reaching all nodes in O(logn) time, while using O(n) messages, where n is the number of nodes in the system. The broadcast algorithm is used to build overlay multicast. We introduce bulk operation, which enables a node to efficiently make multi- ple lookups or send a message to all nodes in a specified set of identifiers. The algorithm ensuresthat all specified nodes are reached in O(logn) time, sending maximum O(logn) messages per node, regardless of the input size of the bulk operation. Moreover, the algorithm avoids sending redundant messages. Previ- ous approaches required multiple lookups, which consume more messages and can render the initiator a bottleneck. Our algorithms are used in DHT-based storage systems, where nodes can do thousands of lookups to fetch large files. We use the bulk operation algorithm to construct a pseudo-reliable broadcast algorithm. Bulk operations can also be usedto implement efficient range queries. Finally, we describe a novel way to place replicas in a DHT, called symmetric replication, that enables parallel recursive lookups. Parallel lookups are known toreducelatencies. However,costlyiterativelookupshavepreviouslybeenused to do parallel lookups. Moreover, joins or leaves only require exchanging O(1) messages,while otherschemes requireat least log(f) messagesfor a replication degreeof f. The algorithms have been implemented in a middleware called the Dis- tributed k-ary System (DKS), which is briefly described. Keywords: distributedhashtables,structuredoverlaynetworks,distributed algorithms, distributed systems,group communication, replication iii To Neda, Anooshe´, Javad, and Nahid Acknowledgments I truly feel privileged to have worked under the supervision of my advisor, ProfessorSeifHaridi. Hehasan impressivebreadthand depthin computer science, which he gladly shares with his students. He also meticulously studied the research problems, and helped with every bit of the research. I am also immensely grateful to Professor Luc Onana Alima, who during my first two years as a doctoral student worked with me side by side and introduced me to the area of distributed computing and distributed hash tables. He also taught me how to write a research paper by carefully walking me through my first one. Together, Seif and Luc deserve most of the credit for the work on the DKS system,which this dissertationis based on. During the year 2006, I had the pleasure to work with Professor Roland Yap from the National University of Singapore. I would like to thank him for all the discussions and detailed readings of this dissertation. I would also like to thank Professor Bernardo Huberman at HP Labs Palo Alto, who let me work on this dissertation while staying with his group during the summer of 2006. During my doctoral studies, I am happy to have worked with Sameh El- Ansary,whocontributedtomanyofthealgorithmsandpapersonDKS.Iwould also like to thank Joe Armstrong, Per Brand, Frej Drejhammar, Erik Klintskog, Janusz Launberg,and Babak Sadighi for the many fruitful and enlighteningdis- cussions in the stimulating environment provided by SICS. I would like to show my gratitude to those who read and commented on drafts of this dissertation: Professor Rassul Ayani, Sverker Janson, Johan Mon- telius, Vicki Carleson, and Professor Vladimir Vlassov. In particular, I thank Cosmin Arad who took time to give detailed comments on the whole disser- tation. I also thank Professor Christian Schulte for making me realize, in the eleventh hour, that my first chapter needed to be rewritten. I acknowledge the help and support given to me by the director of graduate studies, Professor Robert Ro¨nngrenand the Prefekt,Thomas Sjo¨land. Finally, I take this opportunity to show my deepest gratitude to my family. I am eternally grateful to my beloved Neda Kerimi, for always showing endless love and patience duringgoodtimesand bad times. I also would like to express my profound gratitude to my dear sister Anooshe´, and my parents, Javad and Nahid, for their continuous support and encouragement. vii Contents List of Figures xiii List of Algorithms xv 1 Introduction 1 1.1 What is a Distributed Hash Table? . . . . . . . . . . . . . . . 1 1.2 Efficiency of DHTs . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.1 Number of Hops and Routing Table Size . . . . . . . 6 1.2.2 Routing Latency . . . . . . . . . . . . . . . . . . . . . 8 1.3 Properties of DHTs . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4 Security and Trust . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.5 Functionality of DHTs . . . . . . . . . . . . . . . . . . . . . . 12 1.6 Applications on top of DHTs . . . . . . . . . . . . . . . . . . 14 1.6.1 Storage Systems . . . . . . . . . . . . . . . . . . . . . . 14 1.6.2 Host Discovery and Mobility . . . . . . . . . . . . . . 15 1.6.3 Web Caching and Web Servers . . . . . . . . . . . . . 16 1.6.4 Other uses of DHTs . . . . . . . . . . . . . . . . . . . 16 1.7 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.7.1 Lookup Consistency . . . . . . . . . . . . . . . . . . . 17 1.7.2 Group Communication . . . . . . . . . . . . . . . . . 18 1.7.3 Bulk Operations . . . . . . . . . . . . . . . . . . . . . 19 1.7.4 Replication . . . . . . . . . . . . . . . . . . . . . . . . 19 1.7.5 Philosophy . . . . . . . . . . . . . . . . . . . . . . . . 20 1.8 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2 Preliminaries 23 2.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.1.1 Failures . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.2 Algorithm Descriptions . . . . . . . . . . . . . . . . . . . . . 24 2.2.1 Event-driven Notation . . . . . . . . . . . . . . . . . . 25 2.2.2 Control-oriented Notation . . . . . . . . . . . . . . . . 26 2.2.3 Algorithm Complexity . . . . . . . . . . . . . . . . . . 28 ix x 2.3 A Typical DHT . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.3.1 Formal Definitions . . . . . . . . . . . . . . . . . . . . 30 2.3.2 Interval Notation . . . . . . . . . . . . . . . . . . . . . 31 2.3.3 Distributed Hash Tables . . . . . . . . . . . . . . . . . 31 2.3.4 Handling Dynamism . . . . . . . . . . . . . . . . . . . 32 3 Atomic Ring Maintenance 37 3.1 Problems Due to Dynamism . . . . . . . . . . . . . . . . . . . 38 3.2 Concurrency Control . . . . . . . . . . . . . . . . . . . . . . . 40 3.2.1 Safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2.2 Liveness . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.3 Lookup Consistency . . . . . . . . . . . . . . . . . . . . . . . 54 3.3.1 Lookup Consistency in the Presence of Joins . . . . . 55 3.3.2 Lookup Consistency in the Presence of Leaves . . . . 57 3.3.3 Data Management in Distributed Hash Tables . . . . 59 3.3.4 Lookups With Joins and Leaves . . . . . . . . . . . . 60 3.4 Optimized Atomic Ring Maintenance . . . . . . . . . . . . . 63 3.4.1 The Join Algorithm . . . . . . . . . . . . . . . . . . . . 64 3.4.2 The Leave Algorithm . . . . . . . . . . . . . . . . . . . 68 3.5 Dealing With Failures . . . . . . . . . . . . . . . . . . . . . . 69 3.5.1 Periodic Stabilization and Successor-lists . . . . . . . 75 3.5.2 Modified Periodic Stabilization . . . . . . . . . . . . . 79 3.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4 Routing and Maintenance 83 4.1 Additional Pointers as in Chord . . . . . . . . . . . . . . . . 83 4.2 Lookup Strategies . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.2.1 Recursive Lookup . . . . . . . . . . . . . . . . . . . . 86 4.2.2 Iterative Lookup . . . . . . . . . . . . . . . . . . . . . 89 4.2.3 Transitive Lookup . . . . . . . . . . . . . . . . . . . . 91 4.3 Greedy Lookup Algorithm . . . . . . . . . . . . . . . . . . . 93 4.3.1 Routing with Atomic Ring Maintenance . . . . . . . 95 4.4 Improved Lookups with the k-ary Principle . . . . . . . . . . 96 4.4.1 Monotonically Increasing Pointers . . . . . . . . . . . 99 4.5 Topology Maintenance . . . . . . . . . . . . . . . . . . . . . . 101 4.5.1 Efficient Maintenance in the Presence of Failures . . 101 4.5.2 Atomic Maintenance with Additional Pointers . . . . 103

Description:
We use the bulk operation algorithm to construct a pseudo-reliable broadcast Key words: distributed hash tables, structured overlay networks, distributed . 2.2.2 Control-oriented Notation . A metric called stretch is often used to emphasize the latency overhead .. decouple the two name spaces.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.