(cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:2)(cid:6)(cid:7)(cid:4)(cid:8)(cid:9) (cid:10)(cid:10)(cid:10)(cid:10)(cid:11)(cid:12)(cid:13)(cid:14)(cid:7)(cid:4)(cid:2)(cid:15)(cid:16)(cid:10) Principles of Distributed Computing Roger Wattenhofer [email protected] Spring 2016 ii Contents 1 Vertex Coloring 5 1.1 Problem & Model. . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2 Coloring Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 Tree Algorithms 15 2.1 Broadcast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2 Convergecast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3 BFS Tree Construction. . . . . . . . . . . . . . . . . . . . . . . . 18 2.4 MST Construction . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3 Leader Election 23 3.1 Anonymous Leader Election . . . . . . . . . . . . . . . . . . . . . 23 3.2 Asynchronous Ring . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3 Lower Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.4 Synchronous Ring . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4 Distributed Sorting 33 4.1 Array & Mesh. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2 Sorting Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.3 Counting Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5 Shared Memory 47 5.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.2 Mutual Exclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.3 Store & Collect . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . 51 5.3.2 Splitters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.3.3 Binary Splitter Tree . . . . . . . . . . . . . . . . . . . . . 53 5.3.4 Splitter Matrix . . . . . . . . . . . . . . . . . . . . . . . . 55 6 Shared Objects 59 6.1 Centralized Solutions . . . . . . . . . . . . . . . . . . . . . . . . . 59 6.2 Arrow and Friends . . . . . . . . . . . . . . . . . . . . . . . . . . 60 6.3 Ivy and Friends . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 7 Maximal Independent Set 71 7.1 MIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 7.2 Original Fast MIS . . . . . . . . . . . . . . . . . . . . . . . . . . 73 7.3 Fast MIS v2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 iii iv CONTENTS 7.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 8 Locality Lower Bounds 85 8.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 8.2 Locality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 8.3 The Neighborhood Graph . . . . . . . . . . . . . . . . . . . . . . 88 9 Social Networks 93 9.1 Small World Networks . . . . . . . . . . . . . . . . . . . . . . . . 94 9.2 Propagation Studies . . . . . . . . . . . . . . . . . . . . . . . . . 100 10 Synchronization 105 10.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 10.2 Synchronizer α . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 10.3 Synchronizer β . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 10.4 Synchronizer γ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 10.5 Network Partition . . . . . . . . . . . . . . . . . . . . . . . . . . 110 10.6 Clock Synchronization . . . . . . . . . . . . . . . . . . . . . . . . 112 11 Communication Complexity 119 11.1 Diameter & APSP . . . . . . . . . . . . . . . . . . . . . . . . . . 119 11.2 Lower Bound Graphs . . . . . . . . . . . . . . . . . . . . . . . . . 121 11.3 Communication Complexity . . . . . . . . . . . . . . . . . . . . . 124 11.4 Distributed Complexity Theory . . . . . . . . . . . . . . . . . . . 129 12 Wireless Protocols 133 12.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 12.2 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 12.2.1 Non-Uniform Initialization . . . . . . . . . . . . . . . . . 135 12.2.2 Uniform Initialization with CD . . . . . . . . . . . . . . . 135 12.2.3 Uniform Initialization without CD . . . . . . . . . . . . . 137 12.3 Leader Election . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 12.3.1 With High Probability . . . . . . . . . . . . . . . . . . . . 137 12.3.2 Uniform Leader Election . . . . . . . . . . . . . . . . . . . 138 12.3.3 Fast Leader Election with CD . . . . . . . . . . . . . . . . 139 12.3.4 Even Faster Leader Election with CD . . . . . . . . . . . 139 12.3.5 Lower Bound . . . . . . . . . . . . . . . . . . . . . . . . . 142 12.3.6 Uniform Asynchronous Wakeup without CD. . . . . . . . 142 12.4 Useful Formulas. . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 13 Stabilization 147 13.1 Self-Stabilization . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 13.2 Advanced Stabilization . . . . . . . . . . . . . . . . . . . . . . . . 152 14 Labeling Schemes 157 14.1 Adjacency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 14.2 Rooted Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 14.3 Road Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 CONTENTS v 15 Fault-Tolerance & Paxos 165 15.1 Client/Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 15.2 Paxos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 16 Consensus 177 16.1 Two Friends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 16.2 Consensus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 16.3 Impossibility of Consensus . . . . . . . . . . . . . . . . . . . . . . 178 16.4 Randomized Consensus . . . . . . . . . . . . . . . . . . . . . . . 183 16.5 Shared Coin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 17 Byzantine Agreement 189 17.1 Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 17.2 How Many Byzantine Nodes? . . . . . . . . . . . . . . . . . . . . 191 17.3 The King Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 193 17.4 Lower Bound on Number of Rounds . . . . . . . . . . . . . . . . 194 17.5 Asynchronous Byzantine Agreement . . . . . . . . . . . . . . . . 195 18 Authenticated Agreement 199 18.1 Agreement with Authentication . . . . . . . . . . . . . . . . . . . 199 18.2 Zyzzyva . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 19 Quorum Systems 211 19.1 Load and Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 19.2 Grid Quorum Systems . . . . . . . . . . . . . . . . . . . . . . . . 213 19.3 Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 19.4 Byzantine Quorum Systems . . . . . . . . . . . . . . . . . . . . . 218 20 Eventual Consistency & Bitcoin 223 20.1 Consistency, Availability and Partitions . . . . . . . . . . . . . . 223 20.2 Bitcoin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 20.3 Smart Contracts . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 20.4 Weak Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . 233 21 Distributed Storage 237 21.1 Consistent Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . 237 21.2 Hypercubic Networks. . . . . . . . . . . . . . . . . . . . . . . . . 238 21.3 DHT & Churn . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 22 Game Theory 251 22.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 22.2 Prisoner’s Dilemma . . . . . . . . . . . . . . . . . . . . . . . . . . 251 22.3 Selfish Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 22.4 Braess’ Paradox. . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 22.5 Rock-Paper-Scissors . . . . . . . . . . . . . . . . . . . . . . . . . 256 22.6 Mechanism Design . . . . . . . . . . . . . . . . . . . . . . . . . . 257 vi CONTENTS 23 Dynamic Networks 261 23.1 Synchronous Edge-Dynamic Networks . . . . . . . . . . . . . . . 261 23.2 Problem Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 262 23.3 Basic Information Dissemination . . . . . . . . . . . . . . . . . . 263 23.4 Small Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 23.4.1 k-Verification . . . . . . . . . . . . . . . . . . . . . . . . . 266 23.4.2 k-Committee Election . . . . . . . . . . . . . . . . . . . . 267 23.5 More Stable Graphs . . . . . . . . . . . . . . . . . . . . . . . . . 269 24 All-to-All Communication 273 25 Multi-Core Computing 281 25.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 25.1.1 The Current State of Concurrent Programming . . . . . . 281 25.2 Transactional Memory . . . . . . . . . . . . . . . . . . . . . . . . 283 25.3 Contention Management . . . . . . . . . . . . . . . . . . . . . . . 284 26 Dominating Set 293 26.1 Sequential Greedy Algorithm . . . . . . . . . . . . . . . . . . . . 294 26.2 Distributed Greedy Algorithm. . . . . . . . . . . . . . . . . . . . 295 27 Routing 303 27.1 Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 27.2 Mesh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 27.3 Routing in the Mesh with Small Queues . . . . . . . . . . . . . . 305 27.4 Hot-Potato Routing . . . . . . . . . . . . . . . . . . . . . . . . . 306 27.5 More Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 28 Routing Strikes Back 311 28.1 Butterfly. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 28.2 Oblivious Routing . . . . . . . . . . . . . . . . . . . . . . . . . . 312 28.3 Offline Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 Introduction What is Distributed Computing? In the last few decades, we have experienced an unprecedented growth in the area of distributed systems and networks. Distributed computing now encom- passesmanyoftheactivitiesoccurringintoday’scomputerandcommunications world. Indeed,distributedcomputingappearsinquitediverseapplicationareas: The Internet, wireless communication, cloud or parallel computing, multi-core systems, mobile networks, but also an ant colony, a brain, or even the human society can be modeled as distributed systems. These applications have in common that many processors or entities (often called nodes) are active in the system at any moment. The nodes have certain degrees of freedom: they have their own hard- and software. Nevertheless, the nodes may share common resources and information, and, in order to solve a problem that concerns several—or maybe even all—nodes, coordination is necessary. Despitethesecommonalities,ahumanbrainisofcourseverydifferentfroma quadcoreprocessor. Duetosuchdifferences,manydifferentmodelsandparame- tersarestudiedintheareaofdistributedcomputing. Insomesystemsthenodes operatesynchronously,inothersystemstheyoperateasynchronously. Thereare simple homogeneous systems, and heterogeneous systems where different types of nodes, potentially with different capabilities, objectives etc., need to inter- act. Therearedifferentcommunicationtechniques: nodesmaycommunicateby exchangingmessages, orbymeansofsharedmemory. Occasionallythecommu- nication infrastructure is tailor-made for an application, sometimes one has to work with any given infrastructure. The nodes in a system often work together to solve a global task, occasionally the nodes are autonomous agents that have theirownagendaandcompeteforcommonresources. Sometimesthenodescan be assumed to work correctly, at times they may exhibit failures. In contrast toasingle-nodesystem,distributedsystemsmaystillfunctioncorrectlydespite failures as other nodes can take over the work of the failed nodes. There are different kinds of failures that can be considered: nodes may just crash, or they might exhibit an arbitrary, erroneous behavior, maybe even to a degree where it cannot be distinguished from malicious (also known as Byzantine) behavior. It is also possible that the nodes follow the rules indeed, however they tweak theparameterstogetthemostoutofthesystem; inotherwords, thenodesact selfishly. Apparently,therearemanymodels(andevenmorecombinationsofmodels) that can be studied. We will not discuss them in detail now, but simply define 1 2 CONTENTS themwhenweusethem. Towardstheendofthecourseageneralpictureshould emerge, hopefully! Course Overview This course introduces the basic principles of distributed computing, highlight- ing common themes and techniques. In particular, we study some of the funda- mental issues underlying the design of distributed systems: • Communication: Communication does not come for free; often communi- cation cost dominates the cost of local processing or storage. Sometimes we even assume that everything but communication is free. • Coordination: How can you coordinate a distributed system so that it performs some task efficiently? How much overhead is inevitable? • Fault-tolerance: A major advantage of a distributed system is that even in the presence of failures the system as a whole may survive. • Locality: Networkskeepgrowing. Luckily,globalinformationisnotalways neededtosolveatask,oftenitissufficientifnodestalktotheirneighbors. In this course, we will address whether a local solution is possible. • Parallelism: How fast can you solve a task if you increase your computa- tional power, e.g., by increasing the number of nodes that can share the workload? How much parallelism is possible for a given problem? • Symmetry breaking: Sometimes some nodes need to be selected to or- chestratecomputationorcommunication. Thisisachievedbyatechnique called symmetry breaking. • Synchronization: How can you implement a synchronous algorithm in an asynchronous environment? • Uncertainty: If we need to agree on a single term that fittingly describes this course, it is probably “uncertainty”. As the whole system is distrib- uted, the nodes cannot know what other nodes are doing at this exact moment,andthenodesarerequiredtosolvethetasksathanddespitethe lack of global knowledge. Finally, there are also a few areas that we will not cover in this course, mostly because these topics have become so important that they deserve their own courses. Examples for such topics are distributed programming or secu- rity/cryptography. In summary, in this class we explore essential algorithmic ideas and lower bound techniques, basically the “pearls” of distributed computing and network algorithms. We will cover a fresh topic every week. Have fun! BIBLIOGRAPHY 3 Chapter Notes Many excellent text books have been written on the subject. The book closest tothiscourseisbyDavidPeleg[Pel00],asitsharesabouthalfofthematerial. A main focus of Peleg’s book are network partitions, covers, decompositions, and spanners–aninterestingareathatwewillonlytouchinthiscourse. Thereexist a multitude of other text books that overlap with one or two chapters of this course, e.g., [Lei92, Bar96, Lyn96, Tel01, AW04, HKP+05, CLRS09, Suo12]. Another related course is by James Aspnes [Asp] and one by Jukka Suomela [Suo14]. Somechaptersofthiscoursehavebeendevelopedincollaborationwith(for- mer) Ph.D. students, see chapter notes for details. Many students have helped to improve exercises and script. Thanks go to Philipp Brandes, Raphael Ei- denbenz, Roland Flury, Klaus-Tycho F¨orster, Stephan Holzer, Barbara Keller, Fabian Kuhn, Christoph Lenzen, Thomas Locher, Remo Meier, Thomas Mosci- broda, Regina O’Dell, Yvonne-Anne Pignolet, Jochen Seidel, Stefan Schmid, Johannes Schneider, Jara Uitto, Pascal von Rickenbach (in alphabetical order). Bibliography [Asp] James Aspnes. Notes on Theory of Distributed Systems. [AW04] Hagit Attiya and Jennifer Welch. Distributed Computing: Funda- mentals, Simulations and Advanced Topics (2nd edition). John Wi- ley Interscience, March 2004. [Bar96] Valmir C. Barbosa. An introduction to distributed algorithms. MIT Press, Cambridge, MA, USA, 1996. [CLRS09] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and CliffordStein. Introduction to Algorithms (3. ed.). MITPress,2009. [HKP+05] Juraj Hromkovic, Ralf Klasing, Andrzej Pelc, Peter Ruzicka, and Walter Unger. Dissemination of Information in Communication Networks - Broadcasting, Gossiping, Leader Election, and Fault- Tolerance. Texts in Theoretical Computer Science. An EATCS Se- ries. Springer, 2005. [Lei92] F. Thomson Leighton. Introduction to parallel algorithms and ar- chitectures: array, trees, hypercubes. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1992. [Lyn96] Nancy A. Lynch. Distributed Algorithms. Morgan Kaufmann Pub- lishers Inc., San Francisco, CA, USA, 1996. [Pel00] DavidPeleg. Distributed Computing: a Locality-Sensitive Approach. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2000. [Suo12] Jukka Suomela. Deterministic Distributed Algorithms, 2012. [Suo14] Jukka Suomela. Distributed algorithms. Online textbook, 2014. 4 CONTENTS [Tel01] GerardTel. Introduction to Distributed Algorithms. CambridgeUni- versity Press, New York, NY, USA, 2nd edition, 2001.